The wording of the title may be confusing, but I will explain in the code. Say I have a dataframe df:
In [1]: import pandas as pd
df = pd.DataFrame([[20, 20], [20, 0], [0, 20], [0, 0]], columns=['a', 'b'])
df
Out[1]:
a b
0 20 20
1 20 0
2 0 20
3 0 0
Now I want to create a new dataframe "df_new" based on 2 conditions, for example:
If 'a' is greater than 10, then check 'b'. If 'b' is greater than 5, fill values with NaN or cut out data (doesn't matter). If 'b' is less than 5, return the data.
If 'a' is less than 10, return the data regardless of the value of 'b'.
Here's my I attempt with df.where -- it does not return how I would like.
In [2]: df_new = df.where((df['a'] < 10) & (df['b'] < 5))
df_new
Out[2]:
a b
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 0.0 0.0
This is how I would like df_new to return
Out[3]:
a b
0 NaN NaN
1 20.0 0.0
2 0.0 20.0
3 0.0 0.0
I know df.where is doing exactly what I told it to do, but I am not sure how to check the 'b' value depending on the 'a' value with df.where -- I am trying to avoid a loop since my actual dataframe is quite large.
Just use this condition (df.a < 10) | (df.b < 5)
:
df[(df.a < 10) | (df.b < 5)]
a b
1 20 0
2 0 20
3 0 0
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments