I have a pyspark data frame and I want to attach a list as a new column to it. In pandas it is very easy: df['new_column']=mylist
. I did the following:
df.withColumn("Normalized",sparlist).show(false)
But this is the error:
AssertionError: col should be Column
mylist=['fg','af','ab','df','cd']
| id| mylist|
+---+---------+
| 0| fg |
| 1| af |
| 2| ab |
| 3| df |
| 4| cd |
+---+---------
You can use F.array
to create an array from a list:
import pyspark.sql.functions as F
mylist = [0,1,2]
df2 = df.withColumn('list', F.array(*[F.lit(i) for i in mylist]))
df2.show()
+---+---------+
| id| list|
+---+---------+
| 0|[0, 1, 2]|
| 1|[0, 1, 2]|
| 2|[0, 1, 2]|
| 3|[0, 1, 2]|
| 4|[0, 1, 2]|
+---+---------+
For your modified question:
mylist = ['fg','af','ab','df','cd']
df2 = df.withColumn('list', F.array(*[F.lit(i) for i in mylist])[F.col('id')])
df2.show()
+---+----+
| id|list|
+---+----+
| 0| fg|
| 1| af|
| 2| ab|
| 3| df|
| 4| cd|
+---+----+
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments