使用列中的值过滤数据框

Filter dataframe using values from a column

我有一个包含员工姓名、员工电子邮件、经理姓名和经理电子邮件的数据框。我需要使用经理电子邮件的所有唯一值过滤此数据框,并确认它们也出现在员工电子邮件列中,这样可以确保他们也在数据库中作为员工。

例如我有这个数据框:

Employee Name            Employee E-mail            Manager Name            Manager E-mail
Pedro                    pedro@gmail.com            Paul                    paul@gmail.com
Paul                     N/A                        Carlos                  carlos@gmail.com
Richard                  richard@gmail.com          Josh                    josh@gmail.com
Carlos                   carlos@gmail.com           Peter                   #
Maria                    #                          Bob                     N/A
Josh                     josh@gmail.com             Carlos                  carlos@gmail.com

这将 return 以下数据框:

Employee Name            Employee E-mail            Manager Name            Manager E-mail
Richard                  richard@gmail.com          Josh                    josh@gmail.com
Josh                     josh@gmail.com             Carlos                  carlos@gmail.com

最好的方法是什么?

IIUC,你可以使用掩码和布尔索引:

# is the employee email valid? you can use a different pattern e.g. '@company\.com'
m1 = df['Employee E-mail'].str.contains('@').fillna(False)
# is the manager email valid?
m2 = df['Manager E-mail'].str.contains('@').fillna(False)
# is the manager also an employee?
m3 = df['Manager E-mail'].isin(df['Employee E-mail'])

# all conditions True
df2 = df.loc[m1&m2&m3]

输出:

  Employee Name    Employee E-mail Manager Name    Manager E-mail
2       Richard  richard@gmail.com         Josh    josh@gmail.com
5          Josh     josh@gmail.com       Carlos  carlos@gmail.com