根据其他列值使用多个if填充列

同伴编码器

我正在尝试比较pandas数据框中的4列,并根据结果填充第5列。在普通的SQL中,它将是这样的:

if speciality_new is null and location_new is null then 'No match found'
elif specialty <> specialty_new and location <> location_new then 'both are different'
elif specialty_new is null then 'specialty not found'
elif location_new is null then 'location not found'
else 'true'

我读到可以使用np.where来实现,但是我的代码失败了。有人可以告诉我我在做什么错。这是我写的:

masterDf['Match'] = np.where(
    masterDf[speciality_new].isnull() & masterDf[location_new].isnull(), 'No match found',
    masterDf[speciality] != masterDf[speciality_new] & masterDf[location] != masterDf[location_new], 'Both specialty and location didnt match',
    masterDf[speciality] != masterDf[speciality_new], 'Specialty didnt match',
    masterDf[location] != masterDf[location_new], 'Location didnt match',
    True)

错误消息是TypeError: unsupported operand type(s) for &: 'str' and 'str'没有意义的,因为“&”是“和”的语法

dfsample是我拥有的,而dfFinal是我想要的

dfsample = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
       'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
       'location': ['texas', 'dc', 'georgia', '', 'florida'],
       'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
       'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida']})

dfFinal = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
       'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
       'location': ['texas', 'dc', 'georgia', '', 'florida'],
       'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
       'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida'],
       'match': ['TRUE', 'location didn’t match', 'specialty didn’t match', 'both specialty and location didn’t match', 'specialty didn’t match']})
CainãMax Couto-Silva

为了使用来分析多个条件numpy,最好使用numpy.select,您应该在其中指定条件,每个条件的预期输出以及默认输出,就像if-elif-else语句一样:

import numpy as np

condlist = [
    dfsample['speciality_new'].isnull() & dfsample['location_new'].isnull(),
    dfsample['speciality'].ne(dfsample['speciality_new']) & 
    dfsample['location'].ne(dfsample['location_new']),
    dfsample['speciality'].ne(dfsample['speciality_new']),
    dfsample['location'].ne(dfsample['location_new']),
]

choicelist = [
    'No match found',
    'Both specialty and location didnt match',
    'Specialty didnt match',
    'Location didnt match'
]

dfsample['match'] = np.select(condlist, choicelist, default=True)
print(dfsample)

其中,ne以“不等于”代表(你可以简单的使用!=)。


输出:

   ID speciality location speciality_new location_new                                    match
0   1     doctor    texas         doctor        texas                                     True
1   2      nurse       dc          nurse       alaska                     Location didnt match
2   3    patient  georgia       director      georgia                    Specialty didnt match
3   4     driver                   nurse     maryland  Both specialty and location didnt match
4   5   director  florida                     florida                    Specialty didnt match

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章