Dynamic comparison of values of n multiple Pandas columns

juanman

Let's say a user can input the columns and values to compare for a DF, so we can have:

column_list = ['col1', 'col2', 'col3']
value_list = [val1, val2, val3]

So to select the rows that satisfy where col1 >= val1 AND col2 >= val2 AND col3 >= val3 we would write:

selection = (df['col1'] >= val1) & (df['col2'] >= val2) & (df['col3'] >= val3))

or it can be in the form:

selection  = df.loc[(df['col1'] >= val1) & (df['col2'] >= val2) & (df['col3'] >= val3)]

The number of columns is not known in advance, so we can have n columns. We can try this approach:

if n=1:
   selection = (df['col1'] >= val1))
elif n=2:
   selection = (df['col1'] >= val1) & (df['col2'] >= val2))
elif n=3:
   selection = (df['col1'] >= val1) & (df['col2'] >= val2) & (df['col3'] >= val3))

But this is neither scalable nor efficient. I tried by generating strings "df['col<>'] > val<>)" with a foor loop given the input lists but it didn't work for Pandas.

What would be the best pythonic approach for this? To avoid having all the options with if and else statements.

Thank you in advance!

mozway

To perform a comparison with the same operator for all columns, create a Series with the values and columns ids and use it to perform an aligned comparison with the dataframe:

df[df.gt(pd.Series(value_list, index=column_list)).all(1)]

Example input:

>>> value_list
[3, 7, 11]
>>> df
   col1  col2  col3
0     0     1     2
1     3     4     5
2     6     7     8
3     9    10    11
4    12    13    14

output:

   col1  col2  col3
4    12    13    14

intermediates:

>>> pd.Series(value_list, index=column_list)
col1     3
col2     7
col3    11

>>> df.gt(pd.Series(value_list, index=column_list))
    col1   col2   col3
0  False  False  False
1  False  False  False
2   True  False  False
3   True   True  False
4   True   True   True

>>> df.gt(pd.Series(value_list, index=column_list)).all(1)
0    False
1    False
2    False
3    False
4     True

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Pandas conditional comparison: based on multiple columns

pandas isin comparison to multiple columns, not including index

pandas unique values multiple columns

Assign values to multiple columns in Pandas

Pandas compare values of multiple columns

Compare values of multiple pandas columns

Merge columns based on values in multiple columns pandas

How to add values in dynamic columns in Pandas dataframe?

Filtering a dataframe on dynamic columns and values Python Pandas?

Dynamic comparison with pandas AND

pandas unique values multiple columns different dtypes

Comparing multiple pandas DataFrames and columns based on values

Transfer multiple columns string values to numbers in Pandas

Working with NaN values in multiple columns in Pandas

Select rows that match values in multiple columns in pandas

Distributing the values of one column to multiple columns in Pandas

Pandas DataFrame - Got multiple values for argument 'columns'

Pandas: Filter by values within multiple columns

Test if there are values shared in multiple columns of a pandas DataFrame

Pandas: split column into multiple columns with unique values

How to merge multiple columns of values in pandas?

pandas get_level_values for multiple columns

Pandas - merge multiple time columns and fill values

Pandas - split text with values in parenthesis into multiple columns

How to intersect values over multiple columns in pandas

Add multiple columns and values to a pandas dataframe

Pandas multiple "group by" and compare values in different columns

Pivot by count of values in multiple columns pandas

Pandas map according to values in multiple columns

TOP Ranking

HotTag

Archive