Pandas 带排除项的左外连接
Pandas Left outer join with exclusions
我希望进行左外连接,并且 return 行位于左侧 table 而不是右侧 table。
我试过 df=pd.merge(left,right,on['id','date1','date2'],how="outer",indicator=True)
df = df[df['_merge'] == 'left_only']
但是没有用。
df1:
id date1 date2 sold
1 8/11/2021 8/11/2021 22
2 8/11/2021 8/11/2021 12
3 8/12/2021 8/11/2021 18
4 8/13/2021 8/11/2021 14
5 8/11/2021 8/11/2021 10
df2:
id date1 date2 sold
3 8/12/2021 8/11/2021 18
4 8/13/2021 8/11/2021 14
5 8/11/2021 8/11/2021 10
6 8/11/2021 8/11/2021 30
desired output:
id date1 date2 sold
1 8/11/2021 8/11/2021 22
2 8/11/2021 8/11/2021 12
这对我有用:
import pandas as pd
import numpy as np
from io import StringIO
textfile1 = StringIO(""" id date1 date2 sold
1 8/11/2021 8/11/2021 22
2 8/11/2021 8/11/2021 12
3 8/12/2021 8/11/2021 18
4 8/13/2021 8/11/2021 14
5 8/11/2021 8/11/2021 10""")
textfile2 = StringIO(""" id date1 date2 sold
3 8/12/2021 8/11/2021 18
4 8/13/2021 8/11/2021 14
5 8/11/2021 8/11/2021 10
6 8/11/2021 8/11/2021 30
""")
df1 = pd.read_csv(textfile1, sep='\s\s+', engine='python')
df2 = pd.read_csv(textfile2, sep='\s\s+', engine='python')
df_out = df1.merge(df2, how='outer', indicator='ind').query('ind == "left_only"')
print(df_out)
输出:
id date1 date2 sold ind
0 1 8/11/2021 8/11/2021 22 left_only
1 2 8/11/2021 8/11/2021 12 left_only
请注意,如果您不使用连接列,pd.DataFrame.merge
将使用通用命名列进行连接。
更新:
import pandas as pd
import numpy as np
from io import StringIO
textfile1 = StringIO(
""" id date1 date2 sold
1 8/11/2021 8/11/2021 22
2 8/11/2021 8/11/2021 12
3 8/12/2021 8/11/2021 18
4 8/13/2021 8/11/2021 14
5 8/11/2021 8/11/2021 10"""
)
textfile2 = StringIO(
""" id date1 date2 sold
3 8/12/2021 8/11/2021 18
4 8/13/2021 8/11/2021 14
5 8/11/2021 8/11/2021 10
6 8/11/2021 8/11/2021 30
"""
)
df1 = pd.read_csv(textfile1, sep="\s\s+", engine="python")
df2 = pd.read_csv(textfile2, sep="\s\s+", engine="python")
df_out = (
df1.merge(df2, how="outer", indicator="ind", on="id", suffixes=("", "_y"))
.query('ind == "left_only"')
.reindex(df1.columns, axis=1)
)
df_out
输出:
id date1 date2 sold
0 1 8/11/2021 8/11/2021 22.0
1 2 8/11/2021 8/11/2021 12.0
我希望进行左外连接,并且 return 行位于左侧 table 而不是右侧 table。 我试过 df=pd.merge(left,right,on['id','date1','date2'],how="outer",indicator=True)
df = df[df['_merge'] == 'left_only'] 但是没有用。
df1:
id date1 date2 sold
1 8/11/2021 8/11/2021 22
2 8/11/2021 8/11/2021 12
3 8/12/2021 8/11/2021 18
4 8/13/2021 8/11/2021 14
5 8/11/2021 8/11/2021 10
df2:
id date1 date2 sold
3 8/12/2021 8/11/2021 18
4 8/13/2021 8/11/2021 14
5 8/11/2021 8/11/2021 10
6 8/11/2021 8/11/2021 30
desired output:
id date1 date2 sold
1 8/11/2021 8/11/2021 22
2 8/11/2021 8/11/2021 12
这对我有用:
import pandas as pd
import numpy as np
from io import StringIO
textfile1 = StringIO(""" id date1 date2 sold
1 8/11/2021 8/11/2021 22
2 8/11/2021 8/11/2021 12
3 8/12/2021 8/11/2021 18
4 8/13/2021 8/11/2021 14
5 8/11/2021 8/11/2021 10""")
textfile2 = StringIO(""" id date1 date2 sold
3 8/12/2021 8/11/2021 18
4 8/13/2021 8/11/2021 14
5 8/11/2021 8/11/2021 10
6 8/11/2021 8/11/2021 30
""")
df1 = pd.read_csv(textfile1, sep='\s\s+', engine='python')
df2 = pd.read_csv(textfile2, sep='\s\s+', engine='python')
df_out = df1.merge(df2, how='outer', indicator='ind').query('ind == "left_only"')
print(df_out)
输出:
id date1 date2 sold ind
0 1 8/11/2021 8/11/2021 22 left_only
1 2 8/11/2021 8/11/2021 12 left_only
请注意,如果您不使用连接列,pd.DataFrame.merge
将使用通用命名列进行连接。
更新:
import pandas as pd
import numpy as np
from io import StringIO
textfile1 = StringIO(
""" id date1 date2 sold
1 8/11/2021 8/11/2021 22
2 8/11/2021 8/11/2021 12
3 8/12/2021 8/11/2021 18
4 8/13/2021 8/11/2021 14
5 8/11/2021 8/11/2021 10"""
)
textfile2 = StringIO(
""" id date1 date2 sold
3 8/12/2021 8/11/2021 18
4 8/13/2021 8/11/2021 14
5 8/11/2021 8/11/2021 10
6 8/11/2021 8/11/2021 30
"""
)
df1 = pd.read_csv(textfile1, sep="\s\s+", engine="python")
df2 = pd.read_csv(textfile2, sep="\s\s+", engine="python")
df_out = (
df1.merge(df2, how="outer", indicator="ind", on="id", suffixes=("", "_y"))
.query('ind == "left_only"')
.reindex(df1.columns, axis=1)
)
df_out
输出:
id date1 date2 sold
0 1 8/11/2021 8/11/2021 22.0
1 2 8/11/2021 8/11/2021 12.0