How do I create a new column for my dataframe whose values are maps made up of values from different columns?

ak2

I've seen similar questions but haven't been able to find exactly what I need and have been struggling to figure out if I can manage to do what I want without using a UDF.

Say I start with this dataframe:

+---+---+---+
| pk|  a|  b|
+---+---+---+
|  1|  2|  1|
|  2|  4|  2|
+---+---+---+ 

I want the resulting dataframe to look like

+----------------+---+
|              ab| pk|
+----------------+---+
|[A -> 2, B -> 1]|  1|
|[A -> 4, B -> 2]|  2|
+----------------+---+

Where A and B are names that correspond to a and b (I guess I can fix this with an alias, but currently now I'm using a UDF that returns a map of {'A': column a value, 'B': column b value})

Is there any way to accomplish this using create_map or otherwise without a UDF?

Psidom

create_map takes arguments as key, value, key, value ..., for your case:

import pyspark.sql.functions as f
df.select(
  f.create_map(f.lit('A'), f.col('a'), f.lit('B'), f.col('b')).alias('ab'), 
  f.col('pk')
).show()
+----------------+---+
|              ab| pk|
+----------------+---+
|[A -> 2, B -> 1]|  1|
|[A -> 4, B -> 2]|  2|
+----------------+---+

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How do I create a new column in my dataframe based values contained in two different lists?

Create new dataframe columns from one column with different values and types

How to create a new column in a dataframe whose values represent the ranges that values from a certain column fall into?

how do I create a new column in my main data frame filling in the values from a smaller dataset based on two columns they have in common?

How Do I Create New Column In Pandas Dataframe Using Two Columns Simultaneously From A Different Dataframe?

How to create a new column, matching values with headers from a different dataframe

How do I group my pandas columns to map and create a new column based on map values

How do I add up values from different columns based on a condition of the column right next to it?

How do I create a column based on values in another column which are the names of variables in my dataframe whose data I want to fill newcol with? R

How can I create a NEW column in a dataframe based on values of another column in a DIFFERENT dataframe that have common information?

how do I succinctly create a new dataframe column based on matching existing column values with list of values?

How can I restructure a dataframe to create new column labels based on Column[se] values and then populate those new columns with Column[value] Values

Create new row in a dataframe if values from two columns are different

How do I create a new column with values that depend on the values in other columns?

How do i create a function that scans multiple dataframe columns for a value. if any of those values are found the new column returns a given figure

Create a new column in a dataframe consisting of values from existing columns

Create dataframe with new columns derived from unique values in a single column

How to group DataFrame by two columns and create two new columns with min and max values from third column?

How can I create a new column of values based on the grouped sum of values from two other columns?

Create new column in a DataFrame using values from a different row

Pandas - how to create a new dataframe from the columns and values of an old dataframe?

How to create a new column in a DataFrame based on values of two other columns

How to create new columns in pandas dataframe using column values?

How to use values from 2 columns in my dataframe to assign a new column using a dictionary

How to create new dataframe from intervals of the dataframe and map column values to it?

How to create a new dataframe column using values and groupings from other rows and columns in pandas?

How to create a new column based on values from other columns in a Pandas DataFrame

How to divide values in one column to create 2 new columns from DataFrame in Python Pandas?

How do I easily sum up values in different columns?