How to explode multiple columns of a dataframe in pyspark

Visualisation App :

I have a dataframe which consists lists in columns similar to the following. The length of the lists in all columns is not same.

Name  Age  Subjects                  Grades
[Bob] [16] [Maths,Physics,Chemistry] [A,B,C]

I want to explode the dataframe in such a way that i get the following output-

Name Age Subjects Grades
Bob  16   Maths     A
Bob  16  Physics    B
Bob  16  Chemistry  C

How can I achieve this?

mayank agrawal :

This works,

import pyspark.sql.functions as F
from pyspark.sql.types import *

df = sql.createDataFrame(
    [(['Bob'], [16], ['Maths','Physics','Chemistry'], ['A','B','C'])],
    ['Name','Age','Subjects', 'Grades'])
df.show()

+-----+----+--------------------+---------+
| Name| Age|            Subjects|   Grades|
+-----+----+--------------------+---------+
|[Bob]|[16]|[Maths, Physics, ...|[A, B, C]|
+-----+----+--------------------+---------+

Use udf with zip. Those columns needed to explode have to be merged before exploding.

combine = F.udf(lambda x, y: list(zip(x, y)),
              ArrayType(StructType([StructField("subs", StringType()),
                                    StructField("grades", StringType())])))

df = df.withColumn("new", combine("Subjects", "Grades"))\
       .withColumn("new", F.explode("new"))\
       .select("Name", "Age", F.col("new.subs").alias("Subjects"), F.col("new.grades").alias("Grades"))
df.show()


+-----+----+---------+------+
| Name| Age| Subjects|Grades|
+-----+----+---------+------+
|[Bob]|[16]|    Maths|     A|
|[Bob]|[16]|  Physics|     B|
|[Bob]|[16]|Chemistry|     C|
+-----+----+---------+------+

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to select and order multiple columns in a Pyspark Dataframe after a join

Pyspark: explode json in column to multiple columns

How to explode an array into multiple columns in Spark

PySpark explode list into multiple columns based on name

pyspark: Explode struct into columns

How can I sum multiple columns in a spark dataframe in pyspark?

pyspark dataframe filtering on multiple columns

How to pivot a DataFrame in PySpark on multiple columns?

How to drop columns based on multiple filters in a dataframe using PySpark?

Zip and Explode multiple Columns in Spark SQL Dataframe

Pyspark: explode columns to new dataframe

repartitioning by multiple columns for Pyspark dataframe

How to explode an array into multiple columns in Spark Java

How to zip two columns, explode them and finally pivot in Pyspark

Pyspark: How to impute multiple columns in DataFrame in the same action?

How do I use flatmap with multiple columns in Dataframe using Pyspark

pySpark join dataframe on multiple columns

How to explode structs with pyspark explode()

PySpark: How to explode two columns of arrays

Pyspark explode multiple columns with sliding window

PySpark Explode JSON String into Multiple Columns

Explode multiple columns to rows in pyspark

How to add multiple new columns with when condition in pyspark dataframe?

Dataframe explode list columns in multiple rows

How to explode week period into multiple rows in a dataframe

Explode column values into multiple columns in pyspark

Explode 2 columns into multiple columns in pyspark dataframe

pyspark dataframe limiting on multiple columns

How to join pandas dataframe with multiple columns and conditions like pyspark