有两个数据框,其中第一个数据框包含单元格和人员名称的列表。第二个数据帧包含要映射到的实际值。
df1:
Name celllist
Bob ['a', 'v']
April ['b', 'c']
Amy ['v']
Linda ['g', 'r']
df2:
Name cell value
Bob a 4
Bob g 6
Bob v 8
Arpil a 6
Arpil g 8
Arpil b 9
Arpil c 1
Amy v 2
Amy b 2
这就是我想要的,我想从df2中提取大提琴元素的值,然后将它们添加回df1中。
预期结果:df1:
Name Group Group_Name
Bob ['a', 'v'] [4, 8]
April ['b', 'c'] [9, 1]
Amy ['v'] [2]
Linda ['g', 'r'] [None, None]
有人可以帮我解决这个问题或提出更好的解决方案吗?
IIUC,您只需要将列表或NaN中的那些值分组,因此:
# create DataFrame to check which values of cell are in Group
res = df2.merge(df1, on='Name', how='right').explode('Group')
# create mask
mask = res['Group'].eq(res['cell']) | res['cell'].isna()
# filter, group by, agg and rename
output = res[mask].groupby('Name', sort=False).agg({'Group': list, 'value': list}).rename(
columns={'value': 'Group Name'})
print(output)
输出量
Group Group Name
Name
Bob [a, v] [4.0, 8.0]
April [b, c] [9.0, 1.0]
Amy [v] [2.0]
Linda [g, r] [nan, nan]
更新
为了保持原始列表的顺序,您可以添加一个附加步骤:
# create DataFrame to check which values of cell are in Group
res = df1.merge(df2, on='Name', how='left').explode('Group', ignore_index=True)
# reorder DataFrame to keep original list order
res['ord'] = np.arange(len(res))
res['ord'] = res.groupby(['Name', 'Group'])['ord'].transform('first')
res = res.sort_values(by='ord').drop('ord', 1)
# create mask
mask = res['Group'].eq(res['cell']) | res['cell'].isna()
# filter, group by, agg and rename
output = res[mask].groupby('Name', sort=False).agg({'Group': list, 'value': list}).rename(
columns={'value': 'Group Name'})
print(output)
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句