考虑与运行直列聚集体transform
和merge
所有可能的值的数据帧上。最后,用fillna
程序清理:
from itertools import product
...
years_items_df = pd.DataFrame(product(["2015", "2016"], list("ABCD")),
columns = ["Year", "Item"])
df = (df.assign(Count = lambda x: x.groupby(["Year", "Item"])["Year"].transform("count"),
AnnualCount = lambda x: x.groupby(["Year"])["Year"].transform("count"))
.drop_duplicates()
.merge(years_items_df, on=["Year", "Item"], how="right")
.sort_values(["Year", "Item"])
.assign(Count = lambda x: x['Count'].fillna(0),
AnnualCount = lambda x: x['AnnualCount'].ffill(),
Percent = lambda x: x["Count"].div(x["AnnualCount"]))
.reset_index(drop=True)
)
df
# Year Item Count AnnualCount Percent
# 0 2015 A 1.0 3.0 0.333333
# 1 2015 B 1.0 3.0 0.333333
# 2 2015 C 1.0 3.0 0.333333
# 3 2015 D 0.0 3.0 0.000000
# 4 2016 A 2.0 4.0 0.500000
# 5 2016 B 1.0 4.0 0.250000
# 6 2016 C 0.0 4.0 0.000000
# 7 2016 D 1.0 4.0 0.250000
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句