用日期和其他列合并两个 DataFrame

Merging two DataFrames with date and other column

我正在尝试在 Date 和名为 IDV 的列上合并两个 DataFrame。这是我的第一个 DataFrame

df1

Date IDV Values
01/01/2020 Var1 100
01/01/2020 Var2 500
01/01/2020 Var3 600
01/01/2020 Var4 10
01/01/2020 Var5 10

df2

Date IDV Values
01/01/2019 Var2 110
01/01/2019 Var3 510
01/01/2019 Var1 300
01/01/2019 Var5 20
01/01/2019 Var4 20

我想要的输出是

Date IDV Values Last_Year_Values
01/01/2020 Var1 100 300
01/01/2020 Var2 500 110
01/01/2020 Var3 600 510
01/01/2020 Var4 10 20
01/01/2020 Var5 10 20

我试过了pd.merge(df1,df2,left_on ='date',right_on ='IDV', how = 'left')

假设是字符串类型,一个简单的方法是更改​​ df2 中的年份:

pd.merge(df1,
         df2.assign(Date=df2['Date'].str.replace('2019', '2020')),
         on=['Date', 'IDV'],
         how='left', suffixes=('', '_last_year'))

或更通用的方法(适用于任何年份):

pd.merge(df1,
         df2.assign(Date=df2['Date'].str.replace(r'\d+$', lambda m: str(int(m.group(0))+1), regex=True)),
         on=['Date', 'IDV'],
         how='left', suffixes=('', '_last_year'))

输出:

         Date   IDV  Values  Values_last_year
0  01/01/2020  Var1     100               300
1  01/01/2020  Var2     500               110
2  01/01/2020  Var3     600               510
3  01/01/2020  Var4      10                20
4  01/01/2020  Var5      10                20

如果 Date 已经是 datetime64,则使用 DateOffset

cols = ['Date', 'IDV', 'Values']
out = df1.merge(df2[cols].assign(Date=df2['Date']+pd.DateOffset(years=1)), 
                on=['Date', 'IDV'], how='left', suffixes=('', '_last_year'))
print(out)

# Output
        Date   IDV  Values  Values_last_year
0 2020-01-01  Var1     100               300
1 2020-01-01  Var2     500               110
2 2020-01-01  Var3     600               510
3 2020-01-01  Var4      10                20
4 2020-01-01  Var5      10                20

设置:

import pandas as pd

d1 = {'Date': [pd.Timestamp('2020-01-01'),
               pd.Timestamp('2020-01-01'),
               pd.Timestamp('2020-01-01'),
               pd.Timestamp('2020-01-01'),
               pd.Timestamp('2020-01-01')],
              'IDV': ['Var1', 'Var2', 'Var3', 'Var4', 'Var5'],
              'Values': [100, 500, 600, 10, 10]}
df1 = pd.DataFrame(d1)

d2 = {'Date': [pd.Timestamp('2019-01-01'),
               pd.Timestamp('2019-01-01'),
               pd.Timestamp('2019-01-01'),
               pd.Timestamp('2019-01-01'),
               pd.Timestamp('2019-01-01')],
              'IDV': ['Var2', 'Var3', 'Var1', 'Var5', 'Var4'],
              'Values': [110, 510, 300, 20, 20]}
df2 = pd.DataFrame(d2)