add a new column in pyspark dataframe based on matching values from a list

Ank Published at Dev

52

ANK

I am looking for help with pyspark on adding a new column with matching list values.

I have a list of values with variable unique_ids

[Row(card_id=1), Row(card_id=2)]

for each value in the list, if the list value matches column value, then count the number of rows that matches the value and add then create a new column with count value

this is how I am getting the list

unique_ids = data.select('card_id').distinct().collect()

example df

card_id
1
1
2
1
2
1

required dataframe

card_id	Count
1	4
1	4
2	2
1	4
2	2
1	4

Thanks

AdibP

Use window function count

import pyspark.sql.functions as F
from pyspark.sql.window import Window

unique_ids = data.withColumn('count', F.count('card_id').over(Window.partitionBy('card_id')))
unique_ids.show()

+-------+-----+
|card_id|count|
+-------+-----+
|      1|    4|
|      1|    4|
|      1|    4|
|      1|    4|
|      2|    2|
|      2|    2|
+-------+-----+

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-07-18

Comments

0 comments

Login to comment

Related

Add new column based on existing column with concat values Sprak dataframe

R - Add a new column to a dataframe using matching values of another dataframe

Add a new column to a dataframe with multiple condition based on list and a dataframe

Duplicate list values and add in new column to dataframe

Creating new variable in dataframe based on matching values from other dataframe

How to add values to a new column based on a list with values from another column

PySpark: Add new column based on a column with UUID in a dataframe

New column based on matching values from another dataframe pandas

Add new row to pyspark dataframe based on values

Pandas how add a new column to dataframe based on values from all rows, specific columns values applied to whole dataframe

Adding new column to a DataFrame based on values in a list

Add a new column to a PySpark DataFrame from a Python list

Pandas - Add values from series to dataframe column based on index of series matching some value in dataframe

In Pyspark, how to add a list of values as a new column to an existing Dataframe?

How to add a new column in the middle of the dataframe with values based on the previous column?

add a new column with matching values in pandas dataframe

copy column from other dataframe based on matching values

Adding a new column to a dataframe from the values of another dataframe based on a condition

How to create a new column, matching values with headers from a different dataframe

Pyspark: Add new column from another pyspark dataframe

Pyspark / Dataframe: Add new column that keeps nested list as nested list

Pyspark - add columns to dataframe based on values from different dataframe

Mapping elements from one dataframe to another based on matching column values

Pyspark: Add a new column based on a condition and distinct values

create a column in pyspark dataframe from values based on another dataframe

How to add new column from another dataframe based on values in column of first dataframe?

how do I succinctly create a new dataframe column based on matching existing column values with list of values?

Pandas Python - Add values to new column from a dict with keys matching the index of a dataframe

Pandas create new column based on matching values from a column and values in external list

TOP Ranking

Article

HotTag

Archive