我想获取 DataFrame 的所有列。如果 DataFrame 具有平面结构(没有嵌套的 StructTypes),则会df.columns
产生正确的结果。我也想返回所有嵌套的列名,例如
给定的
val schema = StructType(
StructField("name", StringType) ::
StructField("nameSecond", StringType) ::
StructField("nameDouble", StringType) ::
StructField("someStruct", StructType(
StructField("insideS", StringType)::
StructField("insideD", DoubleType)::
Nil
)) ::
Nil
)
val rdd = spark.sparkContext.emptyRDD[Row]
val df = spark.createDataFrame(rdd, schema)
我想得到
Seq("name", "nameSecond", "nameDouble", "someStruct", "insideS", "insideD")
您可以使用此递归函数来遍历架构:
def flattenSchema(schema: StructType): Seq[String] = {
schema.fields.flatMap {
case StructField(name, inner: StructType, _, _) => Seq(name) ++ flattenSchema(inner)
case StructField(name, _, _, _) => Seq(name)
}
}
println(flattenSchema(schema))
// prints: ArraySeq(name, nameSecond, nameDouble, someStruct, insideS, insideD)
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句