我有这个数据框:
team opponent home_dummy round points
0 Athlético-PR Flamengo 0 13 22.91
1 Athlético-PR Atlético-GO 0 17 23.6
2 Athlético-PR Fortaleza 1 20 28.58
3 Athlético-PR Fortaleza 0 1 75.71
4 Athlético-PR Ceará 1 14 42.22
5 Athlético-PR Coritiba 1 10 52.91
6 Athlético-PR Goiás 1 2 39.82
7 Athlético-PR Goiás 0 21 65.13
8 Athlético-PR Internacional 0 15 43.09
9 Athlético-PR Grêmio 1 18 15.38
10 Athlético-PR Sport 0 19 13.09
11 Athlético-PR Santos 1 22 65.45
12 Athlético-PR Santos 0 3 28.04
13 Athlético-PR Palmeiras 1 4 -7.31
14 Athlético-PR Palmeiras 0 23 11.02
15 Athlético-PR Vasco 0 8 15.93
16 Athlético-PR Fluminense 1 5 9.16
17 Athlético-PR Bahia 1 12 59.78
18 Athlético-PR Corinthians 1 16 18.22
19 Athlético-PR Botafogo 1 9 29.35
20 Athlético-PR Bragantino 1 7 20.07
.......
除“Athlético-PR”外,以上数据框还有19个团队。
如何为每个团队对数据框进行分组:
23, 22, 21, 20, 19, 18
。23, 21, 19, 17, 15, 13
或回合的均值22, 20, 18, 16, 14, 12
。最终以:
team mean_total mean_home_0 mean_home_1
0 Athlético-PR mean x mean y mean z
...
我认为您可以做两个单独的groupby:
df = df.sort_values(['team','round'])
out = (df.groupby(['team','home_dummy']).tail(6)
.groupby(['team','home_dummy'])['points'].mean()
.unstack('home_dummy')
.add_prefix('mean_home_')
)
out['mean_total'] = df.groupby('team').tail(6).groupby('team')['points'].mean()
输出:
home_dummy mean_home_0 mean_home_1 mean_total
team
Athlético-PR 29.806667 38.271667 33.108333
另一种选择是编写udf,以将两个groupby减少为一个:
def last6mean(x):
return x.tail(6).mean()
out = (df.groupby(['team','home_dummy'])['points']
.apply(last6mean)
.unstack('home_dummy')
.add_prefix('mean_home_')
)
out['mean_total'] = df.groupby('team')['points'].apply(last6mean)
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句