我有状态重复的客户重复记录,因为每个客户订阅/产品都有一行。我想new_status
为客户生成一个“取消”,每个订阅状态必须一起“取消”。
我用了:
df['duplicated'] = df.groupby('customer', as_index=False)['customer'].cumcount()
分隔索引中的每个重复项以指示重复值
Customer | Status | new_status | duplicated
X |canceled| | 0
X |canceled| | 1
X |active | | 2
Y |canceled| | 0
A |canceled| | 0
A |canceled| | 1
B |active | | 0
B |canceled| | 1
因此,我想使用.apply和/或.loc生成:
Customer | Status | new_status | duplicated
X |canceled| | 0
X |canceled| | 1
X |active | | 2
Y |canceled| | 0
A |canceled| canceled | 0
A |canceled| canceled | 1
B |active | | 0
B |canceled| | 1
据我了解,您可以尝试做:
df['new_status']=(df.groupby('Customer')['Status'].
transform(lambda x: x.eq('canceled').all()).map({True:'cancelled'})).fillna(df.new_status)
print(df)
Customer Status new_status duplicated
0 X canceled 0
1 X canceled 1
2 X active 2
3 Y canceled cancelled 0
4 A canceled cancelled 0
5 A canceled cancelled 1
6 B active 0
7 B canceled 1
由于预期的o / p已更改,因此进行了编辑:
df['new_status']=(df.groupby('Customer')['Status'].
transform(lambda x: x.duplicated(keep=False)&(x.eq('canceled').all()))
.map({True:'cancelled',False:''}))
print(df)
Customer Status new_status duplicated
0 X canceled 0
1 X canceled 1
2 X active 2
3 Y canceled 0
4 A canceled cancelled 0
5 A canceled cancelled 1
6 B active 0
7 B canceled 1
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句