I have a pyspark dataframe which looks like below
df
num11 num21
10 10
20 30
5 25
I am filtering above dataframe on all columns present, and selecting rows with number greater than 10 [no of columns can be more than two]
from pyspark.sql.functions import col
col_list = df.schema.names
df_fltered = df.where(col(c) >= 10 for c in col_list)
desired output is :
num11 num21
10 10
20 30
How can we achieve filtering on multiple columns using iteration on column list as above. [all efforts are appriciated]
[error i reveive is : condition should be string or column]
You can use functools.reduce
to combine the column conditions, to simulate an all condition, for instance, you can use reduce(lambda x, y: x & y, ...)
:
import pyspark.sql.functions as F
from functools import reduce
df.where(reduce(lambda x, y: x & y, (F.col(x) >= 10 for x in df.columns))).show()
+-----+-----+
|num11|num21|
+-----+-----+
| 10| 10|
| 20| 30|
+-----+-----+
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments