使用列中的值过滤数据框
Filter dataframe using values from a column
我有一个包含员工姓名、员工电子邮件、经理姓名和经理电子邮件的数据框。我需要使用经理电子邮件的所有唯一值过滤此数据框,并确认它们也出现在员工电子邮件列中,这样可以确保他们也在数据库中作为员工。
例如我有这个数据框:
Employee Name Employee E-mail Manager Name Manager E-mail
Pedro pedro@gmail.com Paul paul@gmail.com
Paul N/A Carlos carlos@gmail.com
Richard richard@gmail.com Josh josh@gmail.com
Carlos carlos@gmail.com Peter #
Maria # Bob N/A
Josh josh@gmail.com Carlos carlos@gmail.com
这将 return 以下数据框:
Employee Name Employee E-mail Manager Name Manager E-mail
Richard richard@gmail.com Josh josh@gmail.com
Josh josh@gmail.com Carlos carlos@gmail.com
最好的方法是什么?
IIUC,你可以使用掩码和布尔索引:
# is the employee email valid? you can use a different pattern e.g. '@company\.com'
m1 = df['Employee E-mail'].str.contains('@').fillna(False)
# is the manager email valid?
m2 = df['Manager E-mail'].str.contains('@').fillna(False)
# is the manager also an employee?
m3 = df['Manager E-mail'].isin(df['Employee E-mail'])
# all conditions True
df2 = df.loc[m1&m2&m3]
输出:
Employee Name Employee E-mail Manager Name Manager E-mail
2 Richard richard@gmail.com Josh josh@gmail.com
5 Josh josh@gmail.com Carlos carlos@gmail.com
我有一个包含员工姓名、员工电子邮件、经理姓名和经理电子邮件的数据框。我需要使用经理电子邮件的所有唯一值过滤此数据框,并确认它们也出现在员工电子邮件列中,这样可以确保他们也在数据库中作为员工。
例如我有这个数据框:
Employee Name Employee E-mail Manager Name Manager E-mail
Pedro pedro@gmail.com Paul paul@gmail.com
Paul N/A Carlos carlos@gmail.com
Richard richard@gmail.com Josh josh@gmail.com
Carlos carlos@gmail.com Peter #
Maria # Bob N/A
Josh josh@gmail.com Carlos carlos@gmail.com
这将 return 以下数据框:
Employee Name Employee E-mail Manager Name Manager E-mail
Richard richard@gmail.com Josh josh@gmail.com
Josh josh@gmail.com Carlos carlos@gmail.com
最好的方法是什么?
IIUC,你可以使用掩码和布尔索引:
# is the employee email valid? you can use a different pattern e.g. '@company\.com'
m1 = df['Employee E-mail'].str.contains('@').fillna(False)
# is the manager email valid?
m2 = df['Manager E-mail'].str.contains('@').fillna(False)
# is the manager also an employee?
m3 = df['Manager E-mail'].isin(df['Employee E-mail'])
# all conditions True
df2 = df.loc[m1&m2&m3]
输出:
Employee Name Employee E-mail Manager Name Manager E-mail
2 Richard richard@gmail.com Josh josh@gmail.com
5 Josh josh@gmail.com Carlos carlos@gmail.com