我有一个这样的 df:
df = pd.DataFrame([["coffee","soda","coffee","water","soda","soda"],["paper","glass","glass","paper","paper","glass"], list('smlssm')]).T
df.columns = ['item','cup','size']
df:
item cup size
0 coffee paper s
1 soda glass m
2 coffee glass l
3 water paper s
4 soda paper s
5 soda glass m
我想把它变成一个看起来像这样的 df
item cup size freq
0 coffee paper s 1
1 coffee paper m 0
2 coffee paper l 0
3 coffee glass s 0
4 coffee glass m 0
5 coffee glass l 1
6 soda paper s 1
7 soda paper m 0
8 soda paper l 0
9 soda glass s 0
10 soda glass m 2
11 soda glass l 0
. . . . .
. . . . .
. . . . .
因此,对于每个项目,我想要一行包含罩杯和尺寸的可能组合,以及带有频率的附加行。
使用 Pandas 执行此操作的正确方法是什么?
咱们试试吧:
向数据框中添加一个频率列,以指示每行的值是 1。
groupby sum
获取 DataFrame 中的当前计数。
从unique
每列中的值创建一个 MultiIndex 。
使用新的midx
来reindex
用fill_value=0
,这样,当由新创建的索引频率被用0填充。
reset_index
将索引转换回列。
# Columns to Reindex
idx_cols = ['item', 'cup', 'size']
# Create MultIndex With Unique Values
midx = pd.MultiIndex.from_product(
[df[c].unique() for c in idx_cols],
names=idx_cols
)
df = (
df.assign(freq=1) # Add Freq Column initialzed to 1
.groupby(idx_cols)['freq'].sum() # Groupby and Sum freq
.reindex(midx, fill_value=0) # reindex
.reset_index() # reset_index
)
df
:
item cup size freq
0 coffee paper s 1
1 coffee paper m 0
2 coffee paper l 0
3 coffee glass s 0
4 coffee glass m 0
5 coffee glass l 1
6 soda paper s 1
7 soda paper m 0
8 soda paper l 0
9 soda glass s 0
10 soda glass m 2
11 soda glass l 0
12 water paper s 1
13 water paper m 0
14 water paper l 0
15 water glass s 0
16 water glass m 0
17 water glass l 0
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句