Adding multiple columns in pyspark dataframe using a loop

renjith

I need to add a number of columns (4000) into the data frame in pyspark. I am using withColumn function, but getting assertion error.

df3 = df2.withColumn("['ftr' + str(i) for i in range(0, 4000)]", [expr('ftr[' + str(x) + ']') for x in range(0, 4000)])

Eror

Not sure what is wrong. ANy help is appreciated. thank you

BICube

Try to do something like this:

df2 = df3
for i in range(0, 4000):
  df2 = df2.withColumn(f"ftr{i}", lit(f"frt{i}"))

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Adding multiple columns in temp table from dataframe using pyspark

Issue adding new columns to dataframe using pyspark

Adding multiple columns to a dataframe using lookup table

Adding missing columns to a dataframe pyspark

pandas - using a for loop to append multiple columns to a dataframe

Merging multiple dataframe columns into one using for loop

Adding multiple columns in between columns in a data frame using a For Loop

Adding dataframe columns in nested for loop

Concat multiple columns with loop Pyspark

Pyspark eval or expr - Concatenating multiple dataframe columns using when statement

How to drop columns based on multiple filters in a dataframe using PySpark?

How do I use flatmap with multiple columns in Dataframe using Pyspark

pyspark dataframe filtering on multiple columns

pyspark dataframe limiting on multiple columns

pySpark join dataframe on multiple columns

repartitioning by multiple columns for Pyspark dataframe

Adding multiple columns in a table such as using a for / while loop in flask-sqlalchemy

rename columns in dataframe pyspark adding a string

Explode 2 columns into multiple columns in pyspark dataframe

Adding columns to a dataframe using a dict

Using a loop to select multiple columns from a pandas dataframe

Using a Loop to make a single dataframe from columns of multiple dataframes

Plotting multiple columns from a dataframe on the y-axis using a for loop?

PySpark Multiple Columns Using Windows

Pyspark - Loop over dataframe columns by list

Pyspark dataframe convert multiple columns to float

Dynamically rename multiple columns in PySpark DataFrame

Transpose each record into multiple columns in pyspark dataframe

How to explode multiple columns of a dataframe in pyspark