我正在尝试比较pandas数据框中的4列,并根据结果填充第5列。在普通的SQL中,它将是这样的:
if speciality_new is null and location_new is null then 'No match found'
elif specialty <> specialty_new and location <> location_new then 'both are different'
elif specialty_new is null then 'specialty not found'
elif location_new is null then 'location not found'
else 'true'
我读到可以使用np.where来实现,但是我的代码失败了。有人可以告诉我我在做什么错。这是我写的:
masterDf['Match'] = np.where(
masterDf[speciality_new].isnull() & masterDf[location_new].isnull(), 'No match found',
masterDf[speciality] != masterDf[speciality_new] & masterDf[location] != masterDf[location_new], 'Both specialty and location didnt match',
masterDf[speciality] != masterDf[speciality_new], 'Specialty didnt match',
masterDf[location] != masterDf[location_new], 'Location didnt match',
True)
错误消息是TypeError: unsupported operand type(s) for &: 'str' and 'str'
没有意义的,因为“&”是“和”的语法
dfsample是我拥有的,而dfFinal是我想要的
dfsample = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
'location': ['texas', 'dc', 'georgia', '', 'florida'],
'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida']})
dfFinal = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
'location': ['texas', 'dc', 'georgia', '', 'florida'],
'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida'],
'match': ['TRUE', 'location didn’t match', 'specialty didn’t match', 'both specialty and location didn’t match', 'specialty didn’t match']})
为了使用来分析多个条件numpy
,最好使用numpy.select
,您应该在其中指定条件,每个条件的预期输出以及默认输出,就像if-elif-else语句一样:
import numpy as np
condlist = [
dfsample['speciality_new'].isnull() & dfsample['location_new'].isnull(),
dfsample['speciality'].ne(dfsample['speciality_new']) &
dfsample['location'].ne(dfsample['location_new']),
dfsample['speciality'].ne(dfsample['speciality_new']),
dfsample['location'].ne(dfsample['location_new']),
]
choicelist = [
'No match found',
'Both specialty and location didnt match',
'Specialty didnt match',
'Location didnt match'
]
dfsample['match'] = np.select(condlist, choicelist, default=True)
print(dfsample)
其中,ne
以“不等于”代表(你可以简单的使用!=
)。
输出:
ID speciality location speciality_new location_new match
0 1 doctor texas doctor texas True
1 2 nurse dc nurse alaska Location didnt match
2 3 patient georgia director georgia Specialty didnt match
3 4 driver nurse maryland Both specialty and location didnt match
4 5 director florida florida Specialty didnt match
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句