Is it possible to cast multiple columns of a dataframe in pyspark?

Carlos Eduardo Bilar Rodrigues

I have a multi-column pyspark dataframe, and I need to convert the string types to the correct types, for example:

I'm doing like this currently

df = df.withColumn(col_name, col(col_name).cast('float') \
.withColumn(col_id, col(col_id).cast('int') \
.withColumn(col_city, col(col_city).cast('string') \
.withColumn(col_date, col(col_date).cast('date') \
.withColumn(col_code, col(col_code).cast('bigint')

is it possible to create a list with the types and pass it at once to all columns?

Alex Ott

You just need to have some mapping as dictionary, or something like, and then generate correct select statement (you can use withColumn, but usually it can lead to performance problems). Something like this:

import pyspark.sql.functions as F
mapping = {'col1':'float', ....}
df = .... # your input data
rest_cols = [F.col(cl) for cl in df.columns if cl not in mapping]
conv_cols = [F.col(cl_name).cast(cl_type).alias(cl_name) 
   for cl_name, cl_type in mapping.items())
   if cl_name in df.columns]
conv_df.select(*rest_cols, *conv_cols)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Pyspark: cast multiple columns to number

pyspark dataframe filtering on multiple columns

pyspark dataframe limiting on multiple columns

pySpark join dataframe on multiple columns

repartitioning by multiple columns for Pyspark dataframe

Explode 2 columns into multiple columns in pyspark dataframe

How to select columns and cast column types in a pyspark dataframe?

I have a pyspark DataFrame and I want to cast the type of it's columns

Adding multiple columns in pyspark dataframe using a loop

Pyspark dataframe convert multiple columns to float

Dynamically rename multiple columns in PySpark DataFrame

Transpose each record into multiple columns in pyspark dataframe

How to explode multiple columns of a dataframe in pyspark

aggregate pyspark dataframe and create multiple columns

PySpark DataFrame - Join on multiple columns dynamically

Apply a transformation to multiple columns pyspark dataframe

How to pivot a DataFrame in PySpark on multiple columns?

pyspark dataframe transformation by grouping multiple columns independently

Creating multiple columns for a grouped pyspark dataframe

Pyspark Split Dataframe string column into multiple columns

Mapping a function to multiple columns of pyspark dataframe

Pyspark Dataframe Convert category row values into columns with aggregate on multiple columns

Filter spark dataframe with multiple conditions on multiple columns in Pyspark

Pyspark > Dataframe with multiple array columns into multiple rows with one value each

Convert multiple list columns to json array column in dataframe in pyspark

Pyspark eval or expr - Concatenating multiple dataframe columns using when statement

How to add multiple new columns with when condition in pyspark dataframe?

How to drop columns based on multiple filters in a dataframe using PySpark?

How to select and order multiple columns in a Pyspark Dataframe after a join