嗨,我有下面的数据框,其中包含国家/地区列以及其他多个列以及不多的行。我想编写一个通用函数(由于在多个地方使用),可以在withcolumn内部使用以创建新列。
输入
| countries |
|------------|
| RFRA |
| BRES |
| EAST |
| RUSS |
| .... |
输出
| countries |
|-----------|
| FRA |
| BRA |
| POL |
| RUS |
| ... |
下面是我将国家/地区列传递给函数时的代码,无法使用字符串求值。如何从列中提取值并使用指定的字符串值求值,我想作为列返回。
val df = sample.withColumn("renamedcountries", replace($"countries"))
def replace(countries: Column) :Column = {
val Updated = countries match {
case "RFRA" => "FRA"
case "BRES" => "BRA"
case "RESP" => "ESP"
case "RBEL" => "BEL"
case "RGRB" => "GBR"
case "RALL" => "DEU"
case "MARO" => "MAR"
case "RPOR" => "PRT"
case _ => "unknown"
}
Updated
}
包装您拥有的函数逻辑,udf
然后udf
从代码的不同地方调用它。
import org.apache.spark.sql.functions._
val df = Seq( ("RFRA"), ("BRES"), ("RUSS")).toDF("countries")
val mapCountries = udf[String, String](country => {
val Updated = country match {
case "RFRA" => "FRA"
case "BRES" => "BRA"
case "RESP" => "ESP"
case "RBEL" => "BEL"
case "RGRB" => "GBR"
case "RALL" => "DEU"
case "MARO" => "MAR"
case "RPOR" => "PRT"
case _ => "unknown"
}
Updated
})
df.withColumn("renamedCountries", mapCountries($"countries")).show()
+---------+----------------+
|countries|renamedCountries|
+---------+----------------+
| RFRA| FRA|
| BRES| BRA|
| RUSS| unknown|
+---------+----------------+
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句