Pyspark Dataframe Iterate Array Columns

Eric Nguyen

In PySpark, I have a dataframe I'm trying to parse multiple columns with arrays. The last two rows in the dataframe contains multiple values I would like to parse into separate rows.

+-------------+---------------+-------------+--------------------+--------------+-------------+----------------------+--------------+
| WB-API-CNTY | WB-API-UNIQUE | WB-OIL-CODE | WB-OIL-LSE-NBR     | WB-OIL-DIST  | WB-GAS-CODE | WB-GAS-RRC-ID        | WB-GAS-DIS   |
+-------------+---------------+-------------+--------------------+--------------+-------------+----------------------+--------------+
| 449         | 80212         | []          | []                 | []           | []          | []                   | []           |
+-------------+---------------+-------------+--------------------+--------------+-------------+----------------------+--------------+
| 449         | 80214         | ["O"]       | ["05361"]          | ["06"]       | ["O"]       | ["060536"]           | ["00"]       |
+-------------+---------------+-------------+--------------------+--------------+-------------+----------------------+--------------+
| 449         | 80222         | ["O", "O"]  | ["01718", "05492"] | ["06", "06"] | ["O", "O"]  | ["060171", "060549"] | ["00", "00"] |
+-------------+---------------+-------------+--------------------+--------------+-------------+----------------------+--------------+
| 451         | 00005         | ["G", "O"]  | ["5568", "04351"]  | ["10", "09"] | ["G", "O"]  | ["105568", "090435"] | ["09", "00"] |
+-------------+---------------+-------------+--------------------+--------------+-------------+----------------------+--------------+

Results:

+-------------+---------------+-------------+----------------+-------------+-------------+---------------+------------+
| WB-API-CNTY | WB-API-UNIQUE | WB-OIL-CODE | WB-OIL-LSE-NBR | WB-OIL-DIST | WB-GAS-CODE | WB-GAS-RRC-ID | WB-GAS-DIS |
+-------------+---------------+-------------+----------------+-------------+-------------+---------------+------------+
| 449         | 80212         |             |                |             |             |               |            |
+-------------+---------------+-------------+----------------+-------------+-------------+---------------+------------+
| 449         | 80214         | O           | 05361          | 06          | O           | 060536        | 00         |
+-------------+---------------+-------------+----------------+-------------+-------------+---------------+------------+
| 449         | 80222         | O           | 01718          | 06          | O           | 060171        | 00         |
+-------------+---------------+-------------+----------------+-------------+-------------+---------------+------------+
| 449         | 80222         | O           | 05492          | 06          | O           | 060549        | 00         |
+-------------+---------------+-------------+----------------+-------------+-------------+---------------+------------+
| 451         | 00005         | G           | 5568           | 10          | G           | 105568        | 09         |
+-------------+---------------+-------------+----------------+-------------+-------------+---------------+------------+
| 451         | 00005         | O           | 04351          | 09          | O           | 090435        | 00         |
+-------------+---------------+-------------+----------------+-------------+-------------+---------------+------------+
ZygD
array_cols = ['WB-OIL-CODE', 'WB-OIL-LSE-NBR', 'WB-OIL-DIST', 'WB-GAS-CODE', 'WB-GAS-RRC-ID', 'WB-GAS-DIS']
other_cols = [c for c in df.columns if c not in array_cols]
df = df.select(
    *other_cols,
    F.expr(f"inline(arrays_zip({'`' + '`,`'.join(array_cols) + '`'}))")
)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

iterate over pyspark dataframe columns

pyspark dataframe array of struct to columns

PySpark how to iterate over Dataframe columns and change data type?

Pyspark Dataframe - How to concatenate columns based on array of columns as input

Select columns in Pyspark Dataframe

Convert multiple list columns to json array column in dataframe in pyspark

Pyspark DataFrame : How to map array elements to columns and format string with values

Convert Pyspark Dataframe column from array to new columns

Pyspark > Dataframe with multiple array columns into multiple rows with one value each

pySpark/Python iterate through dataframe columns, check for a condition and populate another colum

Iterate rows and columns in Spark dataframe

iterate among two columns of a dataframe

Loop to iterate join over columns in Pyspark

Pyspark: How to iterate through data frame columns?

Filter an array in pyspark dataframe

Pyspark dataframe, iterate between flags, based on group

Drop columns if exist in Dataframe in Pyspark

Pivot and Concatenate columns in pyspark dataframe

Pyspark dataframe OrderBy list of columns

Pyspark dataframe drop columns issue

Adding missing columns to a dataframe pyspark

pyspark dataframe filtering on multiple columns

pyspark dataframe limiting on multiple columns

Pyspark: explode columns to new dataframe

pySpark join dataframe on multiple columns

Pyspark remove duplicate columns in a dataframe

repartitioning by multiple columns for Pyspark dataframe

Acessing nested columns in pyspark dataframe

Count number of columns in pyspark Dataframe?

TOP Ranking

  1. 1

    Failed to listen on localhost:8000 (reason: Cannot assign requested address)

  2. 2

    How to import an asset in swift using Bundle.main.path() in a react-native native module

  3. 3

    Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

  4. 4

    pump.io port in URL

  5. 5

    Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

  6. 6

    BigQuery - concatenate ignoring NULL

  7. 7

    ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

  8. 8

    Do Idle Snowflake Connections Use Cloud Services Credits?

  9. 9

    maven-jaxb2-plugin cannot generate classes due to two declarations cause a collision in ObjectFactory class

  10. 10

    Compiler error CS0246 (type or namespace not found) on using Ninject in ASP.NET vNext

  11. 11

    Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

  12. 12

    Generate random UUIDv4 with Elm

  13. 13

    Jquery different data trapped from direct mousedown event and simulation via $(this).trigger('mousedown');

  14. 14

    Is it possible to Redo commits removed by GitHub Desktop's Undo on a Mac?

  15. 15

    flutter: dropdown item programmatically unselect problem

  16. 16

    Change dd-mm-yyyy date format of dataframe date column to yyyy-mm-dd

  17. 17

    EXCEL: Find sum of values in one column with criteria from other column

  18. 18

    Pandas - check if dataframe has negative value in any column

  19. 19

    How to use merge windows unallocated space into Ubuntu using GParted?

  20. 20

    Make a B+ Tree concurrent thread safe

  21. 21

    ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements

HotTag

Archive