这是我的大熊猫:
df = pd.DataFrame({
'location': ['USA','USA','USA','USA', 'France','France','France','France'],
'date':['2020-11-20','2020-11-21','2020-11-22','2020-11-23', '2020-11-20','2020-11-21','2020-11-22','2020-11-23'],
'dm':[5.,4.,2.,2.,17.,3.,3.,7.]
})
对于精确的位置(因此需要groupby),我想要2天的dm平均值。如果我使用这个:
df['rolling']=df.groupby('location').dm.rolling(2).mean().values
我得到这个不正确的熊猫
location date dm rolling
0 USA 2020-11-20 5.0 NaN
1 USA 2020-11-21 4.0 10.0
2 USA 2020-11-22 2.0 3.0
3 USA 2020-11-23 2.0 5.0
4 France 2020-11-20 17.0 NaN
5 France 2020-11-21 3.0 4.5
6 France 2020-11-22 3.0 3.0
7 France 2020-11-23 7.0 2.0
虽然应该是:
location date dm rolling
0 USA 2020-11-20 5.0 NaN
1 USA 2020-11-21 4.0 4.5
2 USA 2020-11-22 2.0 3.0
3 USA 2020-11-23 2.0 2.0
4 France 2020-11-20 17.0 NaN
5 France 2020-11-21 3.0 10
6 France 2020-11-22 3.0 3.0
7 France 2020-11-23 7.0 5.0
两个问题:
这是问题groupby
创造新的水平MultiIndex
,所以用于匹配原始索引值是必要删除它Series.reset_index
有drop=True
,如果使用.value
则没有alignemnt通过索引,所以顺序应该是喜欢这里的不同:
df['rolling']=df.groupby('location').dm.rolling(2).mean().reset_index(level=0, drop=True)
print (df)
location date dm rolling
0 USA 2020-11-20 5.0 NaN
1 USA 2020-11-21 4.0 4.5
2 USA 2020-11-22 2.0 3.0
3 USA 2020-11-23 2.0 2.0
4 France 2020-11-20 17.0 NaN
5 France 2020-11-21 3.0 10.0
6 France 2020-11-22 3.0 3.0
7 France 2020-11-23 7.0 5.0
详细资料:
print (df.groupby('location').dm.rolling(2).mean())
location
France 4 NaN
5 10.0
6 3.0
7 5.0
USA 0 NaN
1 4.5
2 3.0
3 2.0
Name: dm, dtype: float64
print (df.groupby('location').dm.rolling(2).mean().reset_index(level=0, drop=True))
4 NaN
5 10.0
6 3.0
7 5.0
0 NaN
1 4.5
2 3.0
3 2.0
Name: dm, dtype: float64
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句