How to add multiple new columns with when condition in pyspark dataframe?

user
I need to add two new columns to my existing pyspark dataframe.
Below is my sample data:

Section   Grade     Promotion_grade Section_team
Admin       C       
Account     B       
IT          B   

condition :

If Section = Admin then Promotion_grade = B
If Section = Account then Promotion_grade = A
If Section = IT then
             If Grade = C then Promotion_grade = B & Section_team= team1
             If Grade = D  then Promotion_grade = C & Section_team= team2
             If Grade = A  then Promotion_grade = A+ & Section_team= team3

I can add one column for first two conditions. But I don't know for the rest conditions.

def addCols(data):
   data = (data.withColumn('Promotion_grade', F.when(data.Section  =='Admin', 'B')
                                                .when(data.Section  =='Account', 'A')
                                                .otherwise('Not applicable')))
   return data

Please someone can help me in this? May be the way I'm doing is wrong. Thank you

snithish

You can nest when conditions to handle nested conditions.

Working Example

from pyspark.sql import functions as F

data = [("Admin", "C", ), 
        ("Account", "B", ), 
        ("IT", "B", ),
        ("IT", "C", ),
        ("IT", "D", ),
        ("IT", "A", ),]

df = spark.createDataFrame(data, ("Section", "Grade", ))

# Define Promotion Grade conditions for IT Section
it_promotion_grade = (F.when(F.col("Grade") == "C", "B")
                       .when(F.col("Grade") == "D", "C")
                       .when(F.col("Grade") == "A", "A+")
                       .otherwise("Not applicable"))

# Define Section Team conditions for IT Section
it_section_team = (F.when(F.col("Grade") == "C", "team1")
                    .when(F.col("Grade") == "D", "team2")
                    .when(F.col("Grade") == "A", "team3")
                    .otherwise("Not applicable"))

(df.withColumn("Promotion_grade", F.when(F.col("Section") == "Admin", "B")
                                  .when(F.col("Section") == "Account", "A")
                                  .when(F.col("Section") == "IT", it_promotion_grade)
                                  .otherwise("Not applicable"))
    .withColumn("Section_team", F.when(F.col("Section") == "IT", it_section_team)
                     .otherwise("Not applicable"))
    .show())

Output

+-------+-----+---------------+--------------+
|Section|Grade|Promotion_grade|  Section_team|
+-------+-----+---------------+--------------+
|  Admin|    C|              B|Not applicable|
|Account|    B|              A|Not applicable|
|     IT|    B| Not applicable|Not applicable|
|     IT|    C|              B|         team1|
|     IT|    D|              C|         team2|
|     IT|    A|             A+|         team3|
+-------+-----+---------------+--------------+

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Python DataFrame add a new columns based on multiple columns condition

Add multiple new columns to dataframe based on condition in R

How to filter all dataframe columns to an condition in Pyspark?

add a new dataframe based on multiple condition in pandas

How to add multiple columns to a DataFrame?

How to add columns in pyspark dataframe dynamically

When condition in Pyspark dataframe

PySpark dataframe : Add new column For Each Unique ID and Column Condition

How to explode multiple columns of a dataframe in pyspark

How to pivot a DataFrame in PySpark on multiple columns?

Add new column to dataframe depending on interqection of existing columns with pyspark

Add a new column to a dataframe with multiple condition based on list and a dataframe

How can i add multiple columns to existing dataframe in pyspark aws emr?

How do I add two new columns on the basis of the values of multiple other columns in a pandas dataframe?

Adding multiple new columns to an existing dataframe base on a given condition

How could I get columns that meet a condition from a dataframe in pyspark?

How to enrich dataframe by adding columns in specific condition in pyspark?

Rank selected multiple columns in a dataframe and then add rank data as new columns

Add a new column to a dataframe based on multiple columns from another dataframe

R: Add new column based on condition with multiple columns

How to set new list value based on condition in dataframe in Pyspark?

Pyspark: explode columns to new dataframe

How to filter multiple rows based on rows and columns condition in pyspark

Pyspark eval or expr - Concatenating multiple dataframe columns using when statement

add new columns and rows in pyspark

How to add multiple new columns with a fixed value?

Add columns to pyspark dataframe if not exists

Pandas - how to add multiple conditional columns to dataframe?

How to add new columns to a dataframe if common variables are found in multiple other dataframes