I've seen similar questions but haven't been able to find exactly what I need and have been struggling to figure out if I can manage to do what I want without using a UDF.
Say I start with this dataframe:
+---+---+---+
| pk| a| b|
+---+---+---+
| 1| 2| 1|
| 2| 4| 2|
+---+---+---+
I want the resulting dataframe to look like
+----------------+---+
| ab| pk|
+----------------+---+
|[A -> 2, B -> 1]| 1|
|[A -> 4, B -> 2]| 2|
+----------------+---+
Where A
and B
are names that correspond to a
and b
(I guess I can fix this with an alias, but currently now I'm using a UDF that returns a map of {'A': column a value, 'B': column b value}
)
Is there any way to accomplish this using create_map or otherwise without a UDF?
create_map
takes arguments as key, value, key, value ...
, for your case:
import pyspark.sql.functions as f
df.select(
f.create_map(f.lit('A'), f.col('a'), f.lit('B'), f.col('b')).alias('ab'),
f.col('pk')
).show()
+----------------+---+
| ab| pk|
+----------------+---+
|[A -> 2, B -> 1]| 1|
|[A -> 4, B -> 2]| 2|
+----------------+---+
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments