I have the below Muti index data frame,
| shape | colour | data |
| | d1 | d2 | d3 |
-------------------------------------------
| circle | green | 2 | 4 | 9 |
| circle | red | -9 | 3 | 1 |
| square | orange | 5 | -6 | 2 |
| square | yellow | 9 | 8 | 2 |
created with the following,
header = [ "shape", "colour", "label", "data" ]
data = [
[ "circle", "green", "d1", 2 ],
[ "circle", "green", "d2", 4 ],
[ "circle", "green", "d3", 9 ],
[ "square", "orange", "d1", 5 ],
[ "square", "orange", "d2", -6 ],
[ "square", "orange", "d3", 2 ],
[ "circle", "red", "d1", -9 ],
[ "circle", "red", "d2", 3 ],
[ "circle", "red", "d3", 1 ],
[ "square", "yellow", "d1", 9 ],
[ "square", "yellow", "d2", 8 ],
[ "square", "yellow", "d3", 2 ],
]
raw = pd.DataFrame(data, columns=header)
df = raw.pivot(index=["shape", "colour"], columns=["label"], values=["data"])
and I want to filter the data values based on the colour. It is easy with a set value, I can do, df[(abs(df["data"]) > 2)]
but I do not know/cannot find how to do it when the threshold depends on the colour. With the following filter,
filter = {
"red": {"threshold": 5},
"green": {"threshold": 5},
"yellow": {"threshold": 7},
"orange": {"threshold": 1},
}
The desired output would be,
| shape | colour | data |
| | d1 | d2 | d3 |
-------------------------------------------
| circle | green | Nan | Nan | 9 |
| circle | red | -9 | Nan | Nan |
| square | orange | 5 | -6 | 2 |
| square | yellow | 9 | 8 | Nan |
I tried,
df.apply(lambda r: r["data"] if abs(r["data"]) > filter[r.index.get_level_values(1)].get("threshold", 2) else 0)
and
df[(abs(df["data"]) > filter[df.index.get_level_values(1)].get("threshold", 0))]
to no avail.
From the filter
dictionary create DataFrame
with colours as index. Then reindex the latter with the df
index's 'colour'
level. Compare df
to the obtained values (an individual threshold for each row depending on colour):
threshold = pd.DataFrame.from_dict(filter, orient='index')
mask = df.abs() > threshold.reindex(df.index.get_level_values('colour')).values
print(df[mask])
Output:
data
label d1 d2 d3
shape colour
circle green NaN NaN 9.0
red -9.0 NaN NaN
square orange 5.0 -6.0 2.0
yellow 9.0 8.0 NaN
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments