我有两列带有组件的列,我想比较一下新列是否缺少单词或与旧列不同
第1列
Index Old
0 Caramel Color, Color, Citric Acid, Treated Water, Caffeine, Flavour Enhancer
1 Natural Extracts, Glycol, Ethanol,
第2列
Index New
0 Caramel Color, Color, Citric Acid, Water, Flavour Reducer
1 Glycol, Ethanol
我已经尝试过此解决方案,但似乎无法正常工作
L = df['old']
values_not_in_array = df[~df.old.isin(L)].old
values_in_array = df[df.old(L)].old
创建具有缺失值或不同于新列行中的旧列的值的列的最佳解决方案是什么?
将拆分的值转换为集合并减去,必要时最后加入字符串:
df['diff'] = [', '.join(set(o.split(', ')) - set(n.split(', ')))
for o, n in zip(df.Old, df.New)]
print (df)
Old \
0 Caramel Color, Color, Citric Acid, Treated Wat...
1 Natural Extracts, Glycol, Ethanol
New \
0 Caramel Color, Color, Citric Acid, Water, Flav...
1 Glycol, Ethanol
diff
0 Treated Water, Flavour Enhance, Caffeine
1 Natural Extracts
df['miss'] = [', '.join(set(n.split(', ')) - set(o.split(', ')))
for o, n in zip(df.Old, df.New)]
print (df)
Old \
0 Caramel Color, Color, Citric Acid, Treated Wat...
1 Natural Extracts, Glycol, Ethanol
New miss
0 Caramel Color, Color, Citric Acid, Water, Flav... Water, Flavour Reducer
1 Glycol, Ethanol
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句