我有一个包含员工姓名、员工电子邮件、经理姓名和经理电子邮件的数据框。我需要使用经理电子邮件的所有唯一值过滤此数据框,并确认它们也出现在员工电子邮件列中,这样可以确保它们也作为员工在数据库中。
例如我有这个数据框:
Employee Name Employee E-mail Manager Name Manager E-mail
Pedro [email protected] Paul [email protected]
Paul N/A Carlos [email protected]
Richard [email protected] Josh [email protected]
Carlos [email protected] Peter #
Maria # Bob N/A
Josh [email protected] Carlos [email protected]
这将返回以下数据框:
Employee Name Employee E-mail Manager Name Manager E-mail
Richard [email protected] Josh [email protected]
Josh [email protected] Carlos [email protected]
最好的方法是什么?
IIUC,您可以使用掩码和布尔索引:
# is the employee email valid? you can use a different pattern e.g. '@company\.com'
m1 = df['Employee E-mail'].str.contains('@').fillna(False)
# is the manager email valid?
m2 = df['Manager E-mail'].str.contains('@').fillna(False)
# is the manager also an employee?
m3 = df['Manager E-mail'].isin(df['Employee E-mail'])
# all conditions True
df2 = df.loc[m1&m2&m3]
输出:
Employee Name Employee E-mail Manager Name Manager E-mail
2 Richard [email protected] Josh [email protected]
5 Josh [email protected] Carlos [email protected]
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句