Pyspark create new data frame with updating few columns from old data frame

Spark user

I want to create new data frame with updating data from few columns in old data frame in pyspark.

I have below data frame with parquet format which has columns like uid, name, start_dt, addr, extid

df = spark.read.parquet("s3a://testdata?src=ggl")
df1 = df.select("uid")

I have to create a new data frame in parquet with hashed uid and extid and include the remaining columns also. Please suggest how to do this? I am new :(

Sample input:

uid, name, start_dt, addr, extid
1124569-2, abc, 12/02/2018, 343 Beach Dr Newyork NY, 889

Sample output:

uid, name, start_dt, addr, extid
a8ghshd345698cd, abc, 12/02/2018, 343 Beach Dr Newyork NY, shhj676ssdhghje

Here uid and extid are sha256 hashed.

Thanks in advance.

Manoj Singh

You can create a UDF function which call the hashlib.sha256() on the column and use the withColumn to transform the column.

import pyspark.sql.functions as F
import pyspark.sql.types as T
import hashlib

df = spark.read.parquet("s3a://testdata?src=ggl")

sha256_udf = F.udf(lambda x: hashlib.sha256(str(x).encode('utf-8')).hexdigest(), T.StringType()) 
df1 = df.withColumn('uid', sha256_udf('uid')).withColumn('extid', sha256_udf('extid'))
df1.show()

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-12-2

Comments

0 comments

Create frequency data frame and transfer columns from old data frame

Updating old column entries from new data frame

how can I create a new data frame using exact rows from the old data frame in R Studio?

Pyspark create new data frame with updating few columns from old data frame

Pyspark create new data frame with updating few columns from old data frame

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

pump.io port in URL

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

grouping by column variables and appending a new variable based on condition

Python Read Directory And Output to CSV

BigQuery - concatenate ignoring NULL

Angular 8. Unknown amount of http.get requests in array to call, must be sequential, what to use

Remove adjacent duplicates in linked list in C

Can a 32-bit antivirus program protect you from 64-bit threats

How to keep curl session alive between two php processes?

Limit number of characters in uitextview

Unable to use switch toggle for dark mode in material-ui

In C#, is there a way to create a List directly from an Array without copying?

Laravel getting value from another table using eloquent

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

MTKView Displaying Wide Gamut P3 Colorspace

Vector input in shiny R and then use it

Modify c# Windows Forms control library

SQL Server : are transaction locking table for other users?

When I click any button in my view page the form is submitted

Can you sort columns (horizontally) in Google Sheets?