我有以下格式的数据框
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+
|DataPartition |TimeStamp |FFAction|!||IdentifierValue_effectiveFrom|IdentifierValue_effectiveTo|IdentifierValue_identifierEntityId|IdentifierValue_identifierEntityTypeId|IdentifierValue_identifierTypeId|
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+
|SelfSourcedPublic|2018-03-05T11:54:18+00:00|I|!| |1900-01-01T00:00:00+00:00 |9999-12-31T00:00:00+00:00 |4295903126 |404010 |320150 |
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+
我想在下面的列上添加附加列
IdentifierValue_identifierEntityTypeId
在以下情况下添加额外的列分区
如果IdentifierValue_identifierEntityTypeId = 1001371402,则分区= Repno2FundamentalSeries否则,如果IdentifierValue_identifierEntityTypeId404404,则分区= Repno2Organization
这就是我要实现的目标
val temp = temp1.withColumn("Partition", when($"IdentifierValue_identifierEntityTypeId" === "404010", 0).otherwise("Repno2FundamentalSeries"))
temp.show(false)
我得到低于输出,但得到零值
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+---------+
|DataPartition |TimeStamp |FFAction|!||IdentifierValue_effectiveFrom|IdentifierValue_effectiveTo|IdentifierValue_identifierEntityId|IdentifierValue_identifierEntityTypeId|IdentifierValue_identifierTypeId|Partition|
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+---------+
|SelfSourcedPublic|2018-03-05T11:54:18+00:00|I|!| |1900-01-01T00:00:00+00:00 |9999-12-31T00:00:00+00:00 |4295903126 |404010 |320150 |0 |
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+---------+
我是scala的新手,所以提出了一个基本问题
对于列上的多个条件,何时以及其他情况下如何编写。这对我不起作用
线程“主”中的异常java.lang.IllegalArgumentException:else()只能在when()先前生成的列上应用一次
val dataMain = dataMain1.withColumn(
"Partition",
when($"RelationObjectId_relatedObjectType" === "EDInstrument" && $"RelationObjectId_relatedObjectType" === "Fundamental", "Instrument2Fundamental")
.otherwise(when($"RelationObjectId_relatedObjectType" === "EDInstrument" && $"RelationObjectId_relatedObjectType" === "FundamentalSeries", "Instrument2FundamentalSeries"))
.otherwise(when($"RelationObjectId_relatedObjectType" === "Organization" && $"RelationObjectId_relatedObjectType" === "Fundamental", "Organization2Fundamental"))
.otherwise(when($"RelationObjectId_relatedObjectType" === "Organization" && $"RelationObjectId_relatedObjectType" === "FundamentalSeries", "Organization2FundamentalSeries"))
)
根据您提供的条件,应按以下方式更改when条件。
如果IdentifierValue_identifierEntityTypeId = 1001371402,则分区= Repno2FundamentalSeries否则,如果IdentifierValue_identifierEntityTypeId404404,则分区= Repno2Organization
df1.withColumn("Partition",
when($"IdentifierValue_identifierEntityTypeId" === "1001371402", "Repno2FundamentalSeries")
.otherwise("Repno2Organization")
)
输出:
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+-----------------------+
|DataPartition |TimeStamp |FFAction|!||IdentifierValue_effectiveFrom|IdentifierValue_effectiveTo|IdentifierValue_identifierEntityId|IdentifierValue_identifierEntityTypeId|IdentifierValue_identifierTypeId|Partition |
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+-----------------------+
|SelfSourcedPublic|2018-03-05T11:54:18+00:00|I||! |1900-01-01T00:00:00+00:00 |9999-12-31T00:00:00+00:00 |4295903126 |404010 |320150 |Repno2FundamentalSeries|
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+-----------------------+
编辑:
这是你写嵌套的方式 When
val dataMain = df.withColumn(
"Partition",
when(($"RelationObjectId_relatedObjectType" === "EDInstrument" && $"RelationObjectId_relatedObjectType" === "Fundamental"), "Instrument2Fundamental")
.otherwise(
when($"RelationObjectId_relatedObjectType" === "EDInstrument" && $"RelationObjectId_relatedObjectType" === "FundamentalSeries", "Instrument2FundamentalSeries")
.otherwise(
when($"RelationObjectId_relatedObjectType" === "Organization" && $"RelationObjectId_relatedObjectType" === "Fundamental", "Organization2Fundamental")
.otherwise(when($"RelationObjectId_relatedObjectType" === "Organization" && $"RelationObjectId_relatedObjectType" === "FundamentalSeries", "Organization2FundamentalSeries")
)
)
)
)
希望这可以帮助
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句