我想将DataFrame重新采样到每五秒钟一次,其中原始数据的时间戳是不规则的。抱歉,如果这看起来像是一个重复的问题,但是插值与数据的时间戳有关,我遇到了问题,这就是为什么我将DataFrame包含在此问题中。此答案中的图形显示了我想要的结果,但是我不能使用traces
那里建议的包装。我用pandas 0.19.0
。
考虑飞机的以下爬升路径(如pastebin所示):
Altitude Time
1 0.00 0.00000
2 1000.00 16.45350
3 2000.00 33.19584
4 3000.00 50.25330
5 4000.00 67.64580
6 5000.00 85.38720
7 6000.00 103.56720
8 7000.00 122.29260
9 8000.00 141.61440
10 9000.00 161.59140
11 9999.67 182.27940
12 10000.30 182.33940
13 10000.30 199.76880
14 10000.30 199.82880
15 11000.00 221.67660
16 12000.00 244.36260
17 13000.00 267.93900
18 14000.00 292.46940
19 15000.00 318.01080
20 16000.00 344.36820
21 17000.00 371.32200
22 18000.00 398.91420
23 19000.00 427.19100
24 20000.00 456.24900
25 21000.00 486.38940
26 22000.00 517.91640
27 23000.00 550.96140
28 24000.00 585.65460
29 25000.00 622.12800
30 26000.00 660.35400
31 27000.00 700.37400
32 28000.00 742.39200
33 29000.00 786.57600
34 30000.00 833.13000
35 31000.00 882.09000
36 32000.00 933.46200
37 33000.00 987.40800
38 34000.00 1044.06000
39 35000.00 1103.85000
40 36000.00 1167.52200
41 36088.90 1173.39000
42 36089.60 1173.45000
43 36671.70 1216.60200
44 36672.40 1216.66200
45 38000.00 1295.80200
46 39000.00 1368.45000
47 40000.00 1458.00000
48 41000.00 1574.08200
49 42000.00 1730.97000
50 42231.00 1775.19600
首先,我尝试在保持原始索引不变的情况下进行重采样,如该问题所示,因此我可以进行线性插值,但是我发现没有任何插值方法可以产生正确的结果(请注意原始时间列仅在16.45s处匹配) :
df = df.set_index(pd.to_datetime(df['Time'], unit='s'), drop=False)
resample_index = pd.date_range(start=df.index[0], end=df.index[-1], freq='5s')
dummy_frame = pd.DataFrame(np.NaN, index=resample_index, columns=df.columns)
df.combine_first(dummy_frame).interpolate().iloc[:6]
Time Altitude
1970-01-01 00:00:00.000000 0.000000 0.0
1970-01-01 00:00:05.000000 4.113375 250.0
1970-01-01 00:00:10.000000 8.226750 500.0
1970-01-01 00:00:15.000000 12.340125 750.0
1970-01-01 00:00:16.453500 16.453500 1000.0
1970-01-01 00:00:20.000000 20.639085 1250.0
其次,如该答案所示,我尝试不保留原始索引而重新采样,首先降低到1s,然后降低到5s ,但是插值的值在数据的末尾没有对齐,海拔高度的值也没有对齐(应为1000ft在15到20秒之间)。只是重新采样到1已经产生了错误的结果。
df.resample('1s').interpolate(method='linear').resample('5s').asfreq()
Time Altitude
1970-01-01 00:00:00 0.0 0.000000
1970-01-01 00:00:05 5.0 137.174211
1970-01-01 00:00:10 10.0 274.348422
1970-01-01 00:00:15 15.0 411.522634
1970-01-01 00:00:20 20.0 548.696845
1970-01-01 00:00:25 25.0 685.871056
1970-01-01 00:00:30 30.0 823.045267
1970-01-01 00:00:35 35.0 960.219479
1970-01-01 00:00:40 40.0 1097.393690
1970-01-01 00:00:45 45.0 1234.567901
1970-01-01 00:00:50 50.0 1371.742112
1970-01-01 00:00:55 55.0 1508.916324
1970-01-01 00:01:00 60.0 1646.090535
1970-01-01 00:01:05 65.0 1783.264746
1970-01-01 00:01:10 70.0 1920.438957
1970-01-01 00:01:15 75.0 2057.613169
1970-01-01 00:01:20 80.0 2194.787380
1970-01-01 00:01:25 85.0 2331.961591
1970-01-01 00:01:30 90.0 2469.135802
1970-01-01 00:01:35 95.0 2606.310014
1970-01-01 00:01:40 100.0 2743.484225
1970-01-01 00:01:45 105.0 2880.658436
1970-01-01 00:01:50 110.0 3017.832647
1970-01-01 00:01:55 115.0 3155.006859
1970-01-01 00:02:00 120.0 3292.181070
1970-01-01 00:02:05 125.0 3429.355281
1970-01-01 00:02:10 130.0 3566.529492
1970-01-01 00:02:15 135.0 3703.703704
1970-01-01 00:02:20 140.0 3840.877915
1970-01-01 00:02:25 145.0 3978.052126
... ... ...
1970-01-01 00:27:10 1458.0 40000.000000
1970-01-01 00:27:15 1458.0 40000.000000
1970-01-01 00:27:20 1458.0 40000.000000
1970-01-01 00:27:25 1458.0 40000.000000
1970-01-01 00:27:30 1458.0 40000.000000
1970-01-01 00:27:35 1458.0 40000.000000
1970-01-01 00:27:40 1458.0 40000.000000
1970-01-01 00:27:45 1458.0 40000.000000
1970-01-01 00:27:50 1458.0 40000.000000
1970-01-01 00:27:55 1458.0 40000.000000
1970-01-01 00:28:00 1458.0 40000.000000
1970-01-01 00:28:05 1458.0 40000.000000
1970-01-01 00:28:10 1458.0 40000.000000
1970-01-01 00:28:15 1458.0 40000.000000
1970-01-01 00:28:20 1458.0 40000.000000
1970-01-01 00:28:25 1458.0 40000.000000
1970-01-01 00:28:30 1458.0 40000.000000
1970-01-01 00:28:35 1458.0 40000.000000
1970-01-01 00:28:40 1458.0 40000.000000
1970-01-01 00:28:45 1458.0 40000.000000
1970-01-01 00:28:50 1458.0 40000.000000
1970-01-01 00:28:55 1458.0 40000.000000
1970-01-01 00:29:00 1458.0 40000.000000
1970-01-01 00:29:05 1458.0 40000.000000
1970-01-01 00:29:10 1458.0 40000.000000
1970-01-01 00:29:15 1458.0 40000.000000
1970-01-01 00:29:20 1458.0 40000.000000
1970-01-01 00:29:25 1458.0 40000.000000
1970-01-01 00:29:30 1458.0 40000.000000
1970-01-01 00:29:35 1458.0 40000.000000
如何在执行正确的插值时将原始数据重新采样到5s?我只是在使用错误的插值方法吗?
在@Martin Schmelzer的帮助下(谢谢!),当将time
其method
用作熊猫的插值方法的参数时,我发现了问题中第一个建议的方法正在起作用:
resample_index = pd.date_range(start=df.index[0], end=df.index[-1], freq='5s')
dummy_frame = pd.DataFrame(np.NaN, index=resample_index, columns=df.columns)
df.combine_first(dummy_frame).interpolate('time').iloc[:6]
Altitude Time
1970-01-01 00:00:00.000000 0.000000 0.0000
1970-01-01 00:00:05.000000 303.886711 5.0000
1970-01-01 00:00:10.000000 607.773422 10.0000
1970-01-01 00:00:15.000000 911.660133 15.0000
1970-01-01 00:00:16.453500 1000.000000 16.4535
1970-01-01 00:00:20.000000 1211.828215 20.0000
然后,我可以将其重新采样到5s或其他任何值,结果都是准确的。
df.combine_first(dummy_frame).interpolate('time').resample('5s').asfreq().head()
Altitude Time
1970-01-01 00:00:00 0.000000 0.0
1970-01-01 00:00:05 303.886711 5.0
1970-01-01 00:00:10 607.773422 10.0
1970-01-01 00:00:15 911.660133 15.0
1970-01-01 00:00:20 1211.828215 20.0
最终,结果证明我毕竟只是使用了错误的插值方法。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句