我有一个如下所示的数据框,显示时间、主题和 on_time。
created_time subject on_time
2020-02-26 21:01:40 A 2020-02-26 21:08:40
2020-02-26 21:01:40 A 2020-02-26 21:01:43
2020-02-26 21:01:40 A 2020-02-26 20:50:55
2020-02-26 21:01:40 A 2020-02-26 21:44:40
2020-02-26 01:01:50 B 2020-02-26 01:01:52
2020-02-26 01:01:50 B 2020-02-26 00:08:40
2020-02-26 01:01:50 B 2020-02-26 01:08:40
2020-02-26 01:01:50 B 2020-02-26 00:59:15
我需要输出数据帧显示 created_time、subject 和 on_time 就在 created_time 之前和之后
created_time subject on_time_preceding on_time_following
2020-02-26 21:01:40 A 2020-02-26 21:44:40 2020-02-26 21:01:43
2020-02-26 01:01:50 B 2020-02-26 00:59:15 2020-02-26 01:01:52
on_time_preceding 是 created_time 之前最近的 on_time,on_time_following 是 created_time 之后最近的 on_time
有用
import pandas as pd
import io
table = """
created_time|subject|on_time
2020-02-26 21:01:40|A|2020-02-26 21:08:40
2020-02-26 21:01:40|A|2020-02-26 21:01:43
2020-02-26 21:01:40|A|2020-02-26 20:50:55
2020-02-26 21:01:40|A|2020-02-26 21:44:40
2020-02-26 01:01:50|B|2020-02-26 01:01:52
2020-02-26 01:01:50|B|2020-02-26 00:08:40
2020-02-26 01:01:50|B|2020-02-26 01:08:40
2020-02-26 01:01:50|B|2020-02-26 00:59:15
"""
df = pd.read_table(io.StringIO(table), parse_dates=['created_time', 'on_time'], sep='|')
print(df[df['created_time'] > df['on_time']]
.sort_values('on_time')
.drop_duplicates(['created_time', 'subject'], keep='last')
.merge(df[df['created_time'] < df['on_time']].groupby('subject')['on_time'].min(),
left_on='subject', right_index=True, suffixes=('_preceding', '_following'))
.sort_values('subject')
.reset_index(drop=True))
# output:
created_time subject on_time_preceding on_time_following
0 2020-02-26 21:01:40 A 2020-02-26 20:50:55 2020-02-26 21:01:43
1 2020-02-26 01:01:50 B 2020-02-26 00:59:15 2020-02-26 01:01:52
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句