使用开始日期和结束日期列进行重新采样

Same 发表于 Dev

相同的

我有一个数据框，如下所示：

 START_TIME   END_TIME     TRIAL_No        itemnr
 2403950      2413067      Trial: 1        P14
 2413378      2422499      Trial: 2        P03
 2422814      2431931      Trial: 3        P13
 2432246      2441363      Trial: 4        P02
 2523540      2541257      Trial: 5        P11
 2541864      2560297      Trial: 6        P10
 2560916      2577249      Trial: 7        P05

桌子一直这样下去。START_TIME和END_TIME都以毫秒为单位，即试验的开始和结束时间。因此，我想做的是，我想将START_TIME重新采样到100毫秒bin itme中，并在每个START_TIME和END_TIME之间插入变量（TRIAL_No和itemnr）。在这些区域之外，这些变量的值应为“ NA”。例如，对于第一行，START_TIME是2403950，而END_TIME是2413067。它们之间的差是9117毫秒。因此，“试用：1”停留9117毫秒，这大约是91个bin时间，因为每个bin时间相隔100毫秒。因此，我想在结果数据帧中重复“ Trial_1”和“ P14” 91次。其余的也一样。看起来如下：

Bin_time     TRIAL_No    itemnr
2403950      Trial: 1    P14
2404050      Trial: 1    P14
2404150      Trial: 1    P14
            ...
2413050      Trial: 1    P14
2413150      Trial: 2    P03
2413250      Trial: 2    P03

等等。我不确定是否可以直接在大熊猫中进行或需要一些预处理。

耶斯列尔

在按concat数据帧创建新数据帧之后，我可以按行对其进行分组并应用于resample这些组中的每个组（使用方法ffill来进行向前填充）。

print df
#   START_TIME  END_TIME  TRIAL_No itemnr
#0     2403950   2413067  Trial: 1    P14
#1     2413378   2422499  Trial: 2    P03
#2     2422814   2431931  Trial: 3    P13
#3     2432246   2441363  Trial: 4    P02
#4     2523540   2541257  Trial: 5    P11
#5     2541864   2560297  Trial: 6    P10
#6     2560916   2577249  Trial: 7    P05

#PREDPROCESSING
#helper column for matching start and end rows
df['row'] = range(len(df))

#reshape to df - every row two times repeated for each date of START_TIME and END_TIME
starts = df[['START_TIME','TRIAL_No','itemnr','row']].rename(columns={'START_TIME':'Bin_time'})
ends = df[['END_TIME','TRIAL_No','itemnr','row']].rename(columns={'END_TIME':'Bin_time'})
df = pd.concat([starts, ends])
df = df.set_index('row', drop=True)
df = df.sort_index()

#convert miliseconds to timedelta for resampling by time 100ms
df['Bin_time'] = df['Bin_time'].astype('timedelta64[ms]')

print df
#           Bin_time  TRIAL_No itemnr
#row                                 
#0   00:40:03.950000  Trial: 1    P14
#0   00:40:13.067000  Trial: 1    P14
#1   00:40:13.378000  Trial: 2    P03
#1   00:40:22.499000  Trial: 2    P03
#2   00:40:22.814000  Trial: 3    P13
#2   00:40:31.931000  Trial: 3    P13
#3   00:40:32.246000  Trial: 4    P02
#3   00:40:41.363000  Trial: 4    P02
#4   00:42:03.540000  Trial: 5    P11
#4   00:42:21.257000  Trial: 5    P11
#5   00:42:21.864000  Trial: 6    P10
#5   00:42:40.297000  Trial: 6    P10
#6   00:42:40.916000  Trial: 7    P05
#6   00:42:57.249000  Trial: 7    P05

print df.dtypes
#Bin_time    timedelta64[ms]
#TRIAL_No             object
#itemnr               object
#dtype: object

#resample and fill missing data 
df = df.groupby(df.index).apply(lambda x: x.set_index('Bin_time').resample('100ms',how='first',fill_method='ffill'))

df = df.reset_index()
df = df.drop(['row'], axis=1)

#convert timedelta to integer back
df['Bin_time'] = (df['Bin_time'] / np.timedelta64(1, 'ms')).astype(int)

print df.head()
#  Bin_time  TRIAL_No itemnr
#0  2403950  Trial: 1    P14
#1  2404050  Trial: 1    P14
#2  2404150  Trial: 1    P14
#3  2404250  Trial: 1    P14
#4  2404350  Trial: 1    P14

编辑：

如果您想NaN退出群组，可以在groupby以下代码之后进行更改：

#resample and fill missing data 
df = df.groupby(df.index).apply(lambda x: x.set_index('Bin_time').resample('100ms', how='first',fill_method='ffill'))

#reset only first level - drop index row
df = df.reset_index(level=0, drop=True)
#resample by 100ms, outside are NaN
df = df.resample('100ms', how='first')
df = df.reset_index()
#convert timedelta to integer back
df['Bin_time'] = (df['Bin_time'] / np.timedelta64(1, 'ms')).astype(int)

print df

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-04-7

我来说两句

0 条评论

登录后参与评论

上一篇：将节点添加到C中的链表时，EXC_BAD访问

TOP 榜单

文章

使用开始日期和结束日期列进行重新采样

使用开始日期和结束日期列进行重新采样

UITableView的项目向下滚动后更改颜色，然后快速备份

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Mac OS X更新后的GRUB 2问题

验证REST API参数

Java Eclipse中的错误13，如何解决？

带有错误“ where”条件的查询如何返回结果？

ggplot：对齐多个分面图-所有大小不同的分面

尝试反复更改屏幕上按钮的位置 - kotlin android studio

如何从视图一次更新多行（ASP.NET - Core）

计算数据帧中每行的NA

蓝屏死机没有修复解决方案

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

离子动态工具栏背景色

VB.net将2条特定行导出到DataGridView

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

在Windows 7中无法删除文件（2）

python中的boto3文件上传

当我尝试下载 StanfordNLP en 模型时，出现错误

Node.js中未捕获的异常错误，发生调用