鉴于此数据框:
DriverId time SPEED
0 2021-04-16 21:40:00+00:00 58.500000
2021-04-16 21:41:00+00:00 32.850000
2021-04-16 21:42:00+00:00 89.633333
2021-04-16 21:43:00+00:00 88.166667
2021-04-16 21:44:00+00:00 118.016667
... ... ...
88 2021-04-27 07:30:00+00:00 79.566667
2021-04-27 07:31:00+00:00 59.383333
2021-04-27 07:32:00+00:00 89.133333
2021-04-27 07:33:00+00:00 59.966667
2021-04-27 07:34:00+00:00 25.72413
我想添加列来计算每个司机低于 40 公里/小时的速度数,所以我试过这个:
y[y.SPEED<40].count()
它显示了这一点:
SPEED 4721
dtype: int64
这不是我想要的,expexted 结果必须是这样的:
DriverId SPEED count
0 15.20 2
32.850000
89.633333
88.166667
118.016667
... ... ...
88 79.566667 1
59.383333
89.133333
59.966667
25.72413
我的数据框是一个系列,我将其转换为数据框
y.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 15082 entries, (0, Timestamp('2021-04-16 21:40:00+0000', tz='UTC')) to (88, Timestamp('2021-04-27 07:34:00+0000', tz='UTC'))
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 SPEED 15082 non-null float64
dtypes: float64(1)
memory usage: 922.5 KB
df = pd.DataFrame([['0','2021-04-16 21:40:00+00:00',58.500000],
['0','2021-04-16 21:41:00+00:00', 32.850000],#FIRST ONE
['0','2021-04-16 21:42:00+00:00', 15.633333],#SECOND ONE
['0','2021-04-16 21:43:00+00:00', 88.166667],
['0','2021-04-16 21:44:00+00:00',118.016667],
['88','[2021-04-27 07:30:00+00:00',79.566667],
['88','2021-04-27 07:31:00+00:00',59.383333],
['88','2021-04-27 07:32:00+00:00',89.133333],
['88','2021-04-27 07:33:00+00:00',59.966667],
['88','2021-04-27 07:34:00+00:00',25.72413] # THIRD ONE
],columns=['driver_id','time','speed'])
df = df.set_index("driver_id")
counts = df[df['speed'] < 40].groupby(["driver_id",],as_index=False).agg(
count_col=pd.NamedAgg(column="speed", aggfunc="count")
)
merged_Frame = pd.merge(df, counts, on = 'driver_id', how='inner')
输出
driver_id time speed count_col
0 0 2021-04-16 21:40:00+00:00 58.500000 2
1 0 2021-04-16 21:41:00+00:00 32.850000 2
2 0 2021-04-16 21:42:00+00:00 15.633333 2
3 0 2021-04-16 21:43:00+00:00 88.166667 2
4 0 2021-04-16 21:44:00+00:00 118.016667 2
5 88 [2021-04-27 07:30:00+00:00 79.566667 1
6 88 2021-04-27 07:31:00+00:00 59.383333 1
7 88 2021-04-27 07:32:00+00:00 89.133333 1
8 88 2021-04-27 07:33:00+00:00 59.966667 1
9 88 2021-04-27 07:34:00+00:00 25.724130 1
参考
编辑
import pandas as pd
df = pd.DataFrame([['0','2021-04-16 21:40:00+00:00',58.500000],
['0','2021-04-16 21:41:00+00:00', 32.850000],#FIRST ONE
['0','2021-04-16 21:42:00+00:00', 15.633333],#SECOND ONE
['0','2021-04-16 21:43:00+00:00', 88.166667],
['0','2021-04-16 21:44:00+00:00',118.016667],
['88','[2021-04-27 07:30:00+00:00',79.566667],
['88','2021-04-27 07:31:00+00:00',59.383333],
['88','2021-04-27 07:32:00+00:00',89.133333],
['88','2021-04-27 07:33:00+00:00',59.966667],
['88','2021-04-27 07:34:00+00:00',25.72413] # THIRD ONE
],columns=['driver_id','time','speed'])
df = df.set_index(['driver_id', 'time'])
df['count'] = df[df['speed'] < 40].groupby('driver_id')['speed'].transform('count')
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句