我有一個數據框如下:
import numpy as np
import pandas as pd
df = pd.DataFrame({'text':['she is good', 'she is bad'], 'label':['she is good', 'she is good']})
我想按行比較,如果兩個相同索引的行具有相同的值,請將“標籤”列中的重複項替換為“相同”一詞。
期望的輸出:
pos label
0 she is good same
1 she is bad she is good
到目前為止,我已經嘗試了以下方法,但它返回一個錯誤:
ValueError: Length of values (1) does not match length of index (2)
df['label'] =np.where(df.query("text == label"), df['label']== ' ',df['label']==df['label'] )
您的語法不正確,請查看numpy.where
. 檢查兩列之間的相等性,並替換標籤列中的值:
import numpy as np
df['label'] = np.where(df['text'].eq(df['label']),'same',df['label'])
印刷:
text label
0 she is good same
1 she is bad she is good
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句