I have two array columns (names
, score
). I need to explode both of them. Make names as column name
for score(similar to pivot).
+------------+-------------------------+--------------------+
| id | names | score |
+------------+-------------------------+--------------------+
|ab01 |[F1 , F2, F3, F4, F5] |[00123, 000.001, 00127, 00.0123, 111]
|ab02 |[F1 , F2, F3, F4, F5, F6]|[00124, 000.003, 00156, 00.067, 156, 254]
|ab03 |[F1 , F2, F3, F4, F5] |[00234, 000.078, 00188, 00.0144, 188]
|ab04 |[F1 , F2, F3, F4, F5] |[00345, 000.01112, 001567, 00.0186, 555]
Expected output:
id F1 F2 F3 F4 F5 F6
ab01 00123 000.001 00127 00.0123 111 null
ab02 00124 000.003 00156 00.067 156 254
ab03 00234 000.078 00188 00.0144 188 null
ab04 00345 000.01112 001567 00.0186 555 null
I tried zipping up names and score and then exploding them
combine = F.udf(lambda x, y: list(zip(x, y)),
ArrayType(
StructType(
[StructField("names", StringType()),
StructField("score", StringType())
]
)
)
)
df2 = df.withColumn("new", combine("score", "names"))
.withColumn("new", F.explode("new"))
.select("id",
F.col("new.names").alias("names"),
F.col("new.score").alias("score")
)
I'm getting an error:
TypeError: zip argument #1 must support iteration
I also tried exploding using rdd flatMap()
and I still get the same error.
Is there an alternate way to achieve this?
Thanks in advance.
Try:
df2 = df.set_index('id').apply(pd.Series.explode).reset_index()
df3 = df2.pivot(columns='names', values='score', index='id')
df3:
names F1 F2 F3 F4 F5 F6
id
ab01 00123 000.001 00127 00.0123 111 NaN
ab02 00123 000.003 00156 00.067 156 254
ab03 00234 000.078 00188 00.0144 188 NaN
ab04 00345 000.01112 001567 00.0186 555 NaN
edit:
x = (df.apply(lambda x: dict(zip(x['names'], x['score'])), axis=1))
y = pd.DataFrame(x.values.tolist(), index=x.index).fillna("null").join(df.id)
or
x = (df.apply(lambda x: dict(zip(x['names'], x['score'])), axis=1))
z = pd.DataFrame(x.values.tolist(), index=x.index).fillna("null")
y = pd.concat([df.id , z], axis=1)
y:
F1 F2 F3 F4 F5 F6 id
0 00123 000.001 00127 00.0123 111 null ab01
1 00123 000.003 00156 00.067 156 254 ab02
2 00234 000.078 00188 00.0144 188 null ab03
3 00345 000.01112 001567 00.0186 555 null ab04
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments