我有两个大小不同的数据框,每个数据框都有一列句子,如下所示:
import pandas as pd
data1 = {'text': ['the old man is here','the young girl is there', 'the old woman is here','the young boy is there','the young girl is here','the old girl is here']}
df1 = pd.DataFrame (data, columns = ['text'])
和第二个数据帧:
data2 = {'text': ['the old man is here','the old girl is there', 'the young woman is here','the young boy is there']}
df2 = pd.DataFrame (data, columns = ['text'])
如您所见,在两个数据框中都有一些相似的句子。我想要作为输出的是df1中的一列,如果两个字符串相似,则该列将指示true,否则将返回false:
desired output:
text result
'the old man is here' True
'the young girl is there' False
'the old woman is here' False
'the young boy is there' True
'the young girl is here' False
'the old girl is here' False
我试过了:
df1['result'] = np.where(df1['text'].str == df2['text'].str, 'True', 'False')
但是当我检查时,它只会返回false,而不会返回“ true”
Series.isin
需要布尔值时使用True/False
:
df1['result'] = df1['text'].isin(df2['text'])
print (df1)
text result
0 the old man is here True
1 the young girl is there False
2 the old woman is here False
3 the young boy is there True
4 the young girl is here False
5 the old girl is here False
像这样工作:
#removed '' from 'True', 'False' for boolean
df1['result'] = np.where(df1['text'].isin(df2['text']), True, False)
您的解决方案会创建字符串,因此如果需要用于过滤,则会失败:
df1['result'] = np.where(df1['text'].isin(df2['text']), 'True', 'False')
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句