Pandas 带排除项的左外连接

Pandas Left outer join with exclusions

我希望进行左外连接,并且 return 行位于左侧 table 而不是右侧 table。 我试过 df=pd.merge(left,right,on['id','date1','date2'],how="outer",indicator=True)

df = df[df['_merge'] == 'left_only'] 但是没有用。

df1:
  id      date1       date2    sold
   1     8/11/2021   8/11/2021   22
   2     8/11/2021   8/11/2021   12
   3     8/12/2021   8/11/2021   18
   4     8/13/2021   8/11/2021   14
   5     8/11/2021   8/11/2021   10


df2:
   id      date1       date2    sold
   3     8/12/2021   8/11/2021   18
   4     8/13/2021   8/11/2021   14
   5     8/11/2021   8/11/2021   10
   6     8/11/2021   8/11/2021   30



desired output:
  id      date1       date2     sold
   1     8/11/2021   8/11/2021   22
   2     8/11/2021   8/11/2021   12

这对我有用:

import pandas as pd
import numpy as np

from io import StringIO

textfile1 = StringIO("""  id      date1       date2    sold
   1     8/11/2021   8/11/2021   22
   2     8/11/2021   8/11/2021   12
   3     8/12/2021   8/11/2021   18
   4     8/13/2021   8/11/2021   14
   5     8/11/2021   8/11/2021   10""")

textfile2 = StringIO("""   id      date1       date2    sold
   3     8/12/2021   8/11/2021   18
   4     8/13/2021   8/11/2021   14
   5     8/11/2021   8/11/2021   10
   6     8/11/2021   8/11/2021   30
""")

df1 = pd.read_csv(textfile1, sep='\s\s+', engine='python')
df2 = pd.read_csv(textfile2, sep='\s\s+', engine='python')

df_out = df1.merge(df2, how='outer', indicator='ind').query('ind == "left_only"')
print(df_out)

输出:

   id      date1      date2  sold        ind
0   1  8/11/2021  8/11/2021    22  left_only
1   2  8/11/2021  8/11/2021    12  left_only

请注意,如果您不使用连接列,pd.DataFrame.merge 将使用通用命名列进行连接。

更新:

import pandas as pd
import numpy as np

from io import StringIO

textfile1 = StringIO(
    """  id      date1       date2    sold
   1     8/11/2021   8/11/2021   22
   2     8/11/2021   8/11/2021   12
   3     8/12/2021   8/11/2021   18
   4     8/13/2021   8/11/2021   14
   5     8/11/2021   8/11/2021   10"""
)

textfile2 = StringIO(
    """   id      date1       date2    sold
   3     8/12/2021   8/11/2021   18
   4     8/13/2021   8/11/2021   14
   5     8/11/2021   8/11/2021   10
   6     8/11/2021   8/11/2021   30
"""
)

df1 = pd.read_csv(textfile1, sep="\s\s+", engine="python")
df2 = pd.read_csv(textfile2, sep="\s\s+", engine="python")

df_out = (
    df1.merge(df2, how="outer", indicator="ind", on="id", suffixes=("", "_y"))
    .query('ind == "left_only"')
    .reindex(df1.columns, axis=1)
)
df_out

输出:

   id      date1      date2  sold
0   1  8/11/2021  8/11/2021  22.0
1   2  8/11/2021  8/11/2021  12.0