pyspark 转置数据框

玛雅人

我有一个如下所示的数据框

ID, Code_Num, Code,              Code1,  Code2,  Code3

10, 1,       A1005*B1003,       A1005,  B1003,  null

12, 2,       A1007*D1008*C1004, A1007,  D1008,  C1004

我需要有关转置上述数据集的帮助,输出应显示如下。

ID, Code_Num, Code,              Code_T

10, 1,        A1005*B1003,       A1005

10, 1,        A1005*B1003,       B1003

12, 2,        A1007*D1008*C1004, A1007

12, 2,        A1007*D1008*C1004, D1008

12, 2,        A1007*D1008*C1004, C1004
cph_sto

第 1 步:创建DataFrame.

values = [(10, 'A1005*B1003', 'A1005', 'B1003',  None),(12, 'A1007*D1008*C1004', 'A1007',  'D1008',  'C1004')]
df = sqlContext.createDataFrame(values,['ID','Code','Code1','Code2','Code3'])
df.show()
+---+-----------------+-----+-----+-----+
| ID|             Code|Code1|Code2|Code3|
+---+-----------------+-----+-----+-----+
| 10|      A1005*B1003|A1005|B1003| null|
| 12|A1007*D1008*C1004|A1007|D1008|C1004|
+---+-----------------+-----+-----+-----+

第 2 步:爆炸DataFrame-

def to_transpose(df, by):

    # Filter dtypes and split into column names and type description
    cols, dtypes = zip(*((c, t) for (c, t) in df.dtypes if c not in by))
    # Spark SQL supports only homogeneous columns
    assert len(set(dtypes)) == 1, "All columns have to be of the same type"

    # Create and explode an array of (column_name, column_value) structs
    kvs = explode(array([
      struct(lit(c).alias("key"), col(c).alias("val")) for c in cols
    ])).alias("kvs")

    return df.select(by + [kvs]).select(by + ["kvs.key", "kvs.val"])

df = to_transpose(df, ["ID","Code"]).drop('key').withColumnRenamed("val","Code_T")
df.show()
+---+-----------------+------+
| ID|             Code|Code_T|
+---+-----------------+------+
| 10|      A1005*B1003| A1005|
| 10|      A1005*B1003| B1003|
| 10|      A1005*B1003|  null|
| 12|A1007*D1008*C1004| A1007|
| 12|A1007*D1008*C1004| D1008|
| 12|A1007*D1008*C1004| C1004|
+---+-----------------+------+

如果您只想要non-Null列中的值Code_T,只需运行下面的语句 -

df = df.where(col('Code_T').isNotNull())

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章