查找两个熊猫列之间的缺失词

tomoc4

我有两列带有组件的列,我想比较一下新列是否缺少单词或与旧列不同

第1列

Index     Old
0         Caramel Color, Color, Citric Acid, Treated Water, Caffeine, Flavour Enhancer
1         Natural Extracts, Glycol, Ethanol,

第2列

Index     New
0         Caramel Color, Color, Citric Acid, Water, Flavour Reducer
1         Glycol, Ethanol

我已经尝试过此解决方案,但似乎无法正常工作

L = df['old']
values_not_in_array = df[~df.old.isin(L)].old
values_in_array = df[df.old(L)].old

创建具有缺失值或不同于新列行中的旧列的值的列的最佳解决方案是什么?

耶斯列尔

将拆分的值转换为集合并减去,必要时最后加入字符串:

df['diff'] = [', '.join(set(o.split(', ')) - set(n.split(', '))) 
                                                          for o, n in zip(df.Old, df.New)]
print (df)
                                                 Old  \
0  Caramel Color, Color, Citric Acid, Treated Wat...   
1                  Natural Extracts, Glycol, Ethanol   

                                                 New  \
0  Caramel Color, Color, Citric Acid, Water, Flav...   
1                                    Glycol, Ethanol   

                                       diff  
0  Treated Water, Flavour Enhance, Caffeine  
1                          Natural Extracts  

df['miss'] = [', '.join(set(n.split(', ')) - set(o.split(', '))) 
                                                           for o, n in zip(df.Old, df.New)]
print (df)
                                                 Old  \
0  Caramel Color, Color, Citric Acid, Treated Wat...   
1                  Natural Extracts, Glycol, Ethanol   

                                                 New                    miss  
0  Caramel Color, Color, Citric Acid, Water, Flav...  Water, Flavour Reducer  
1                                    Glycol, Ethanol                          

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章