我有一个DataFrame:
import pandas as pd
df = pd.DataFrame({'First': ['Sam', 'Greg', 'Steve', 'Sam',
'Jill', 'Bill', 'Nod', 'Mallory', 'Ping', 'Lamar'],
'Last': ['Stevens', 'Hamcunning', 'Strange', 'Stevens',
'Vargas', 'Simon', 'Purple', 'Green', 'Simon', 'Simon'],
'Address': ['112 Fake St',
'13 Crest St',
'14 Main St',
'112 Fake St',
'2 Morningwood',
'7 Cotton Dr',
'14 Main St',
'20 Main St',
'7 Cotton Dr',
'7 Cotton Dr'],
'Status': ['Infected', '', 'Infected', '', '', '', '','', '', 'Infected'],
})
然后应用以下分组代码
df_index = df.groupby(['Address', 'Last']).filter(lambda x: (x['Status'] == 'Infected').any()).index
df.loc[df_index, 'Status'] = 'Infected'
而不是按照分组代码将所有内容都标记为“已感染”。有没有一种方法可以选择将要更新的值,以便可以将它们标记为其他值?例如:
df2 = df.copy(deep=True)
df2['Status'] = ['Infected', '', 'Infected', 'Infected2', '', 'Infected2', '', '', 'Infected2', 'Infected']
我认为这可以达到您想要的结果,但效果略有不同:
def infect_new_people(group):
if (group['Status'] == 'Infected').any():
# Only affect people not already infected
group.loc[group['Status'] != 'Infected', 'Status'] = 'Infected2'
return group['Status']
# Need group_keys=False so that each group has the same index
# as the original dataframe
df['Status'] = df.groupby(['Address', 'Last'], group_keys=False).apply(infect_new_people)
df
Out[36]:
Address First Last Status
0 112 Fake St Sam Stevens Infected
1 13 Crest St Greg Hamcunning
2 14 Main St Steve Strange Infected
3 112 Fake St Sam Stevens Infected2
4 2 Morningwood Jill Vargas
5 7 Cotton Dr Bill Simon Infected2
6 14 Main St Nod Purple
7 20 Main St Mallory Green
8 7 Cotton Dr Ping Simon Infected2
9 7 Cotton Dr Lamar Simon Infected
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句