我有一个要重新采样到5s窗口的时间序列,例如:
INDEX size price
2018-05-07 21:53:13.731 0.365127 9391.800000
2018-05-07 21:53:16.201 0.666127 9391.800000
2018-05-07 21:53:18.038 0.143104 9391.800000
2018-05-07 21:53:18.243 0.025643 9391.800000
2018-05-07 21:53:18.265 0.640484 9391.800000
2018-05-07 21:53:18.906 -0.100000 9391.793421
2018-05-07 21:53:19.829 0.559516 9391.800000
2018-05-07 21:53:19.846 0.100000 9391.800000
2018-05-07 21:53:19.870 0.006560 9391.800000
2018-05-07 21:53:20.734 0.666076 9391.800000
2018-05-07 21:53:20.775 0.666076 9391.800000
2018-05-07 21:53:28.607 0.100000 9391.800000
2018-05-07 21:53:28.610 0.041991 9391.800000
2018-05-07 21:53:29.283 -0.053518 9391.793421
2018-05-07 21:53:47.322 -0.046302 9391.793421
2018-05-07 21:53:49.182 0.100000 9391.800000
def tick_features(x):
volume = np.abs(x['size']).sum()
num_trades = x['size'].count()
return pd.Series([volume,num_trades], index=['volume','num_trades'])
tick = tick.groupby(pd.Grouper(freq='5S')).apply(tick_features)
如何通过pd.Grouper()
和获取每个5S的第一个和最后一个元素.apply()
?
我可以用.resample().agg()
和做类似的事情,{'price':'first'}
但由于其他原因,我想pd.Grouper()
尽可能地通过它。
我建议使用DataFrameGroupBy.agg
元组和函数列表first
以及last
:
tick_features = [('volume', lambda x: x.abs().sum()),
('num_trades', 'count'),
('first_trade', 'first'),
('last_trade', 'last')]
tick = tick.groupby(pd.Grouper(freq='5S'))['size'].agg(tick_features)
print (tick)
volume num_trades first_trade last_trade
INDEX
2018-05-07 21:53:10 0.365127 1 0.365127 0.365127
2018-05-07 21:53:15 2.241434 8 0.666127 0.006560
2018-05-07 21:53:20 1.332152 2 0.666076 0.666076
2018-05-07 21:53:25 0.195509 3 0.100000 -0.053518
2018-05-07 21:53:30 0.000000 0 NaN NaN
2018-05-07 21:53:35 0.000000 0 NaN NaN
2018-05-07 21:53:40 0.000000 0 NaN NaN
2018-05-07 21:53:45 0.146302 2 -0.046302 0.100000
apply
解决方案是可能的,但需要if-else
声明:
def tick_features(x):
volume = np.abs(x['size']).sum()
num_trades = x['size'].count()
if not x.empty:
f = x['size'].iloc[0]
l = x['size'].iloc[-1]
else:
f = np.nan
l = np.nan
return pd.Series([volume,num_trades, f, l],
index=['volume','num_trades', 'first_trade', 'last_trade'])
tick = tick.groupby(pd.Grouper(freq='5S')).apply(tick_features)
print (tick)
volume num_trades first_trade last_trade
INDEX
2018-05-07 21:53:10 0.365127 1.0 0.365127 0.365127
2018-05-07 21:53:15 2.241434 8.0 0.666127 0.006560
2018-05-07 21:53:20 1.332152 2.0 0.666076 0.666076
2018-05-07 21:53:25 0.195509 3.0 0.100000 -0.053518
2018-05-07 21:53:30 0.000000 0.0 NaN NaN
2018-05-07 21:53:35 0.000000 0.0 NaN NaN
2018-05-07 21:53:40 0.000000 0.0 NaN NaN
2018-05-07 21:53:45 0.146302 2.0 -0.046302 0.100000
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句