How to concatenate null columns in spark dataframe in java?

Prateek Gautam

I am working with spark in java and I want to create a column which is a concatenation of all other column values separated by comma. I have tried few ways to do this but couldn't find a solution.

For example-

col1 | col2 | col3 | result
 1   | john | 2.3  | 1,john,2.3

The problem that I'm facing is if a column value is null then simply nothing should be there but so far I have not achieved this.

For example-

col1 | col2 | col3 | result
 1   | null | 2.3  | 1,,2.3  -> this is what I'm trying to achieve

What I have tried-

1.) I tried concat function of spark sql but if a column value is null then entire value of concat function will be null, which is not something i want.

2.) concat_ws function simply ignore the null values hence can't be used. Hence in my 2nd example the result would be something like 1,2.3

3.) using coalesce require matching data types so if I do something like coalesce(col_name,'') under concat function and col_name is number data type then it throws data types mismatch error.

4.) using case when statement also require same data type in THEN & ELSE conditions. So if I say case when column_1 is null THEN '' ELSE column1 END will throw error if column1 is number because '' is empty string and column_1 is number.

one way it can be achieved is creating map_function. But is there a way to do it in spark sql way?

philantrovert

You can cast all your fields to String so that you can run NVL on them and set them to empty string '' if they're null.

To generate an expression for all the columns in your dataframe automatically, you can use map function:

df.show()
//+----+----+----+
//|col1|col2|col3|
//+----+----+----+
//|   1|John| 2.3|
//|   2|null| 2.3|
//+----+----+----+


List<String> columns = df.columns() ; 
// List("col1", "col2", "col3")

String nvlExpr = columns.stream().map(i -> "nvl(cast ("+i+" as string),'')").collect(joining(", ", "concat_ws(','," , ")"));

//concat_ws(',',nvl(cast (col1 as string), ''), nvl(cast (col2 as string), ''),nvl(cast (col3 as string), ''))

df.withColumn("result", expr(nvlExpr)).show()

//+----+----+----+----------+
//|col1|col2|col3|    result|
//+----+----+----+----------+
//|   1|John| 2.3|1,John,2.3|
//|   2|null| 2.3|    2,,2.3|
//+----+----+----+----------+

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Concatenate two columns of spark dataframe with null values

How to concatenate two columns of spark dataframe with null values but get one value

How to concatenate spark dataframe columns using Spark sql in databricks

Concatenate columns to list of columns in Apache Spark DataFrame

Concatenate distinct values from several columns to one column in Java spark dataframe

How to concatenate transformations on a spark scala dataframe?

How do you concatenate multiple columns in a DataFrame into a another column when some values are null?

How to concatenate characters of certain columns of a dataframe?

How to concatenate pandas dataframe columns into iterable lists?

How to concatenate columns of dataframes in a dictionary to a new dataframe

How to concatenate columns in previous row in dataframe?

How to concatenate columns in a Dataframe based on the date?

How to Join Multiple Columns in Spark SQL using Java for filtering in DataFrame

how to concat all columns in a spark dataframe, using java?

How to construct a column based on other columns using dataframe in Spark Java?

How to work with Java Apache Spark MLlib when DataFrame has columns?

subtract two columns with null in spark dataframe

Spark dataframe not adding columns with null values

Spark DataFrame Get Null Count For All Columns

Spark: How to group the dataframe in columns

Concatenate multiple columns of dataframe with a seperating character for Non-null values

Spark Dataframe concatenate strings

How to replace null values with a specific value in Dataframe using spark in Java?

Concatenate columns of a dataframe with a separator

Dataframe concatenate columns

Pyspark Dataframe - How to concatenate columns based on array of columns as input

how can concatenate two string columns in one column in spark python

How to find the list of columns which are all having null or NA values in spark scala Dataframe?

Dropping multiple columns of Spark DataFrame in Java

TOP Ranking

HotTag

Archive