我最初具有以下数据帧,然后执行groupby和汇总以连接重叠的时间范围。我想在最终数据框中添加另一列,并且该列将由重叠行上的数据串联而成。
df['newid']=(df['START']-df['END'].shift()).dt.total_seconds().gt(0).cumsum()
print (df.to_string(index=False))
ELEMENT TEXT START END newid
OLT2227-LT3-PON0-ONT03 USECASE1 - ALARM1 -NO OVERLAP 2021-01-19 18:00:00 2021-01-19 19:00:00 0
OLT2227-LT3-PON0-ONT03 USECASE1 - ALARM2 - NO OVERLAP 2021-01-19 19:10:00 2021-01-19 20:00:12 1
OLT2227-LT3-PON0-ONT05 USECASE2 - ALARM1 - Fully Contained 2021-01-19 18:00:00 2021-01-19 23:00:00 1
OLT2227-LT3-PON0-ONT05 USECASE2 - ALARM2 - Fully Contained 2021-01-19 19:00:00 2021-01-19 20:00:12 1
OLT2227-LT3-PON0-ONT10 USECASE3 - ALARM1 - START-END-RELATION 2021-01-19 22:00:00 2021-01-19 22:30:00 2
OLT2227-LT3-PON0-ONT10 USECASE3 - ALARM2 - START-END-RELATION 2021-01-19 22:30:00 2021-01-19 23:00:12 2
OLT2227-LT3-PON0-ONT21 USECASE3-ALARM1 2021-01-19 22:00:00 2021-01-19 22:10:00 2
OLT2227-LT3-PON0-ONT21 USECASE3-ALARM2-NO-END 2021-01-19 22:15:00 2042-01-19 20:00:12 3
OLT2227-LT3-PON0-ONT4 USECASE-4 2021-01-19 17:30:00 2042-01-19 20:00:12 3
OLT2227-LT3-PON0-ONT4 USECASE-4 2021-01-19 20:00:00 2021-01-19 23:00:00 3
OLT2227-LT3-PON0-ONT99 USECASE-5 2021-01-19 17:30:00 2021-01-19 22:00:00 3
OLT2227-LT3-PON0-ONT99 USECASE-5 2021-01-19 20:00:00 2042-01-19 20:00:12 3
newdf=df.groupby(['newid','ELEMENT']).agg({'START':'min','END':'max'}).reset_index(level=1)
print (newdf.to_string(index=False))
ELEMENT START END
OLT2227-LT3-PON0-ONT03 2021-01-19 18:00:00 2021-01-19 19:00:00
OLT2227-LT3-PON0-ONT03 2021-01-19 19:10:00 2021-01-19 20:00:12
OLT2227-LT3-PON0-ONT05 2021-01-19 18:00:00 2021-01-19 23:00:00
OLT2227-LT3-PON0-ONT10 2021-01-19 22:00:00 2021-01-19 23:00:12
OLT2227-LT3-PON0-ONT21 2021-01-19 22:00:00 2021-01-19 22:10:00
OLT2227-LT3-PON0-ONT21 2021-01-19 22:15:00 2042-01-19 20:00:12
OLT2227-LT3-PON0-ONT4 2021-01-19 17:30:00 2042-01-19 20:00:12
OLT2227-LT3-PON0-ONT99 2021-01-19 17:30:00 2042-01-19 20:00:12
如您所见,在最后一个数据框中,我仅获得ELEMENT,START和END列。但是,我想得到的是一个将在聚合过程中连接TEXT列的数据框。
ELEMENT START END TEXT
OLT2227-LT3-PON0-ONT03 2021-01-19 18:00:00 2021-01-19 19:00:00 USECASE1 - ALARM1 -NO OVERLAP
OLT2227-LT3-PON0-ONT03 2021-01-19 19:10:00 2021-01-19 20:00:12 USECASE1 - ALARM2 - NO OVERLAP
OLT2227-LT3-PON0-ONT05 2021-01-19 18:00:00 2021-01-19 23:00:00 USECASE2 - ALARM1 - Fully Contained; USECASE2 - ALARM2 - Fully Contained
OLT2227-LT3-PON0-ONT10 2021-01-19 22:00:00 2021-01-19 23:00:12 USECASE3 - ALARM1 - START-END-RELATION; USECASE3 - ALARM2 - START-END-RELATION
OLT2227-LT3-PON0-ONT21 2021-01-19 22:00:00 2021-01-19 22:10:00 USECASE3-ALARM1
OLT2227-LT3-PON0-ONT21 2021-01-19 22:15:00 2042-01-19 20:00:12 USECASE3-ALARM2-NO-END
OLT2227-LT3-PON0-ONT4 2021-01-19 17:30:00 2042-01-19 20:00:12 USECASE-4 ; USECASE-4
OLT2227-LT3-PON0-ONT99 2021-01-19 17:30:00 2042-01-19 20:00:12 USECASE-5 ; USECASE-5
有人可以帮忙吗?
您可以聚合方法str.join
:
(df.groupby(['newid','ELEMENT'])
.agg({'START': 'min', 'END':'max', 'TEXT': ' ; '.join})
.reset_index(1))
输出(仅TEXT列):
USECASE1 - ALARM1 -NO OVERLAP
USECASE1 - ALARM2 - NO OVERLAP
USECASE2 - ALARM1 - Fully Contained ; USECASE2 - ALARM2 - Fully Contained
USECASE3 - ALARM1 - START-END-RELATION ; USECASE3 - ALARM2 - START-END-RELATION
USECASE3-ALARM1
USECASE3-ALARM2-NO-END
USECASE-4 ; USECASE-4
USECASE-5 ; USECASE-5
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句