我的数据框有 2 列:i)客户 ID ii)状态。对于每个客户,如果与该客户相关的任何行包含状态“已确认”,则应在第三列中返回“活动”,否则应返回“不活动”。感谢任何帮助,谢谢!
**Current Dataframe**
Customer ID Status
A Confirmed
A Transferred
A Confirmed
A Withdrawn
B Transferred
B Withdrawn
B Transferred
C Confirmed
D Withdrawn
---
**Expected Output**
Customer ID Status Customer Status
A Confirmed Active
A Transferred Active
A Confirmed Active
A Withdrawn Active
B Transferred Inactive
B Withdrawn Inactive
B Transferred Inactive
C Confirmed Active
D Withdrawn Inactive
您可以通过测试每个组是否匹配至少一个值Series.any
,因为需要与原始列使用相同大小的掩码GroupBy.transform
,最后传递到numpy.where
:
m = df['Status'].eq('Confirmed').groupby(df['Customer ID']).transform('any')
df['Customer Status'] = np.where(m, 'Active','Inactive')
print (df)
Customer ID Status Customer Status
0 A Confirmed Active
1 A Transferred Active
2 A Confirmed Active
3 A Withdrawn Active
4 B Transferred Inactive
5 B Withdrawn Inactive
6 B Transferred Inactive
7 C Confirmed Active
8 D Withdrawn Inactive
或者Customer ID
如果至少匹配一个值,则获取所有值,并通过以下方式比较原始列Series.isin
:
m = df['Customer ID'].isin(df.loc[df['Status'].eq('Confirmed'),'Customer ID'])
df['Customer Status'] = np.where(m, 'Active','Inactive')
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句