I have a Dataset that contains channel information. What I want now is to aggregate f.e. all channels starting with X_ and if one of the status values is "not okay" the value in the new columns should also be "not okay", otherwise "okay"
+-----------------+-----------------+-----------------+-----------------+----------------+
|X_ChannelA_status|Y_ChannelB_status|X_ChannelC_status|X_ChannelD_status|X_channel_status|
+-----------------+-----------------+-----------------+-----------------+----------------+
| not okay| okay| okay| not okay| true|
| not okay| not okay| not okay| not okay| true|
+-----------------+-----------------+-----------------+-----------------+----------------+
I already achived something like this by mapping okay and not okay to zeros and ones where "not okay" = 1 and "okay" = 0. Then I summarized all the columns into a new one and if the value in the new column was > 0 then it was obvious that one of the columns had to contain a "not okay".
val df_grouped = df_filtered.select(list_groupX.map(col).reduce((c1, c2) => c1 + c2) as "sum")
I would love to get rid of the string to int mapping thing since I think it slows down the calculation.
You can get your requirement fulfilled just by using array_contains
and array
inbuilt functions and of course by using withColumn
function. But before that you need to find the column names starting with X to check for the condition
val xStartingCols = df.columns.filter(_.startsWith("X"))
And then use the column names to check for the condition using when
otherwise
import org.apache.spark.sql.functions._
df.withColumn("new_col", when(array_contains(array(xStartingCols.map(col): _*), "not okay") === lit(true), "not okay").otherwise("okay"))
You should have your desired output dataframe
+-----------------+-----------------+-----------------+-----------------+----------------+--------+
|X_ChannelA_status|Y_ChannelB_status|X_ChannelC_status|X_ChannelD_status|X_channel_status|new_col |
+-----------------+-----------------+-----------------+-----------------+----------------+--------+
|okay |okay |okay |okay |true |okay |
|not okay |not okay |not okay |not okay |true |not okay|
+-----------------+-----------------+-----------------+-----------------+----------------+--------+
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments