df.where necessary condition and a secondary condition

quinncasey22

The wording of the title may be confusing, but I will explain in the code. Say I have a dataframe df:

In [1]: import pandas as pd  
        df = pd.DataFrame([[20, 20], [20, 0], [0, 20], [0, 0]], columns=['a', 'b']) 
        df 
Out[1]: 
    a   b
0  20  20
1  20   0
2   0  20
3   0   0

Now I want to create a new dataframe "df_new" based on 2 conditions, for example:

  • If 'a' is greater than 10, then check 'b'. If 'b' is greater than 5, fill values with NaN or cut out data (doesn't matter). If 'b' is less than 5, return the data.

  • If 'a' is less than 10, return the data regardless of the value of 'b'.

Here's my I attempt with df.where -- it does not return how I would like.

In [2]: df_new = df.where((df['a'] < 10) & (df['b'] < 5))  
        df_new                                                                  
Out[2]: 
     a    b
0  NaN  NaN
1  NaN  NaN
2  NaN  NaN
3  0.0  0.0

This is how I would like df_new to return

Out[3]: 
      a     b
0   NaN   NaN
1  20.0   0.0
2   0.0  20.0
3   0.0   0.0

I know df.where is doing exactly what I told it to do, but I am not sure how to check the 'b' value depending on the 'a' value with df.where -- I am trying to avoid a loop since my actual dataframe is quite large.

Psidom

Just use this condition (df.a < 10) | (df.b < 5):

df[(df.a < 10) | (df.b < 5)]

    a   b
1  20   0
2   0  20
3   0   0

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related