Explode column values into multiple columns in pyspark

ben

I have the following pyspark dataframe.

+--------------------------------------------------------------------------------+
|substitutes                                                                     |
+--------------------------------------------------------------------------------+
|[[1, 30981733]]                                                                 |
|[[1, 598319049], [2, 38453298], [3, 2007569845]]                                |
|[[1, 10309216]]                                                                 |
|[[1, 730446111], [2, 617024811], [3, 665689309], [4, 883699488], [5, 159896736]]|
|[[1, 10290923], [2, 33282357]]                                                  |
|[[1, 102649381], [2, 10294853], [3, 10294854], [4, 44749181], [5, 35132896]]    |
|[[1, 10307642], [2, 10307636], [3, 15754215], [4, 45612359], [5, 10307635]]     |
|[[1, 43982130], [2, 15556050], [3, 15556051], [4, 11961012], [5, 16777263]]     |
|[[1, 849607426], [2, 185158834], [3, 11028011], [4, 10309801], [5, 11028010]]   |
|[[1, 21905160], [2, 21609422], [3, 21609417], [4, 20554612], [5, 20554601]]     |
+--------------------------------------------------------------------------------+

And I would like to explode the columns into multiple columns

| substitutes_1 | substitutes_2 | substitutes_3 | substitutes_4 | substitutes_5 |
|---------------|---------------|---------------|---------------|---------------|
| 30981733      |               |               |               |               |
| 598319049     | 38453298      | 2007569845    |               |               |
| 10309216      |               |               |               |               |
| 730446111     | 617024811     | 665689309     | 883699488     | 159896736     |
| 10290923      | 33282357      |               |               |               |
| 102649381     | 10294853      | 10294854      | 44749181      | 35132896      |
| 10307642      | 10307636      | 15754215      | 45612359      | 10307635      |
| 43982130      | 15556050      | 15556051      | 11961012      | 16777263      |
| 849607426     | 185158834     | 11028011      | 10309801      | 11028010      |
| 21905160      | 21609422      | 21609417      | 20554612      | 20554601      |

How can i unpivot and explode the array?

Shubham Sharma

Create a temporary id column for uniquely identifying each row. Explode the substitutes then extract the first item as col and second item as val. Then group the dataframe by id, pivot on col and aggregate val with first

(
    df
    .withColumn('id', F.monotonically_increasing_id())
    .selectExpr('id', "explode(substitutes) AS S")
    .selectExpr('id', "'substitutes_' || S[0] AS col", "S[1] as val")
    .groupby('id').pivot('col').agg(F.first('val'))
    .drop('id')
)

Result

+-------------+-------------+-------------+-------------+-------------+
|substitutes_1|substitutes_2|substitutes_3|substitutes_4|substitutes_5|
+-------------+-------------+-------------+-------------+-------------+
|    598319049|     38453298|   2007569845|         null|         null|
|     30981733|         null|         null|         null|         null|
|    102649381|     10294853|     10294854|     44749181|     35132896|
|    849607426|    185158834|     11028011|     10309801|     11028010|
|     10307642|     10307636|     15754215|     45612359|     10307635|
|    730446111|    617024811|    665689309|    883699488|    159896736|
|     10290923|     33282357|         null|         null|         null|
|     43982130|     15556050|     15556051|     11961012|     16777263|
|     10309216|         null|         null|         null|         null|
|     21905160|     21609422|     21609417|     20554612|     20554601|
+-------------+-------------+-------------+-------------+-------------+

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Pyspark: explode json in column to multiple columns

Explode multiple columns to rows in pyspark

Explode 2 columns into multiple columns in pyspark dataframe

Explode column of lists into multiple columns

PySpark - Explode columns into rows based on the type of the column

Pyspark explode multiple columns with sliding window

How to explode multiple columns of a dataframe in pyspark

PySpark explode list into multiple columns based on name

PySpark Explode JSON String into Multiple Columns

Pyspark concat column to an array and explode values

explode an spark array column to multiple columns sparksql

Pyspark dataframe with XML column and multiple values inside: Extract columns out of it

Pyspark: Split a single column with multiple values into separate columns

pyspark create multiple columns from values of existing column

Explode a column with multiple values separated by comma

pyspark: Explode struct into columns

Explode Maptype column in pyspark

Parse columns and explode multiple columns in to one long column

PySpark: Create new rows (explode) based on a number in a column and multiple conditions

How to explode column with multiple records into multiple Columns in Spark

Explode column into columns

Explode multiple columns

Explode on multiple columns in Hive

Explode multiple columns in Pandas

Pyspark: explode columns to new dataframe

Rust Polars: Is it possible to explode a list column into multiple columns?

How to get a list column with values of multiple columns given in another column in Pyspark Dataframe?

Explode column with array of arrays - PySpark

Explode a column with a List of Jsons with Pyspark