Pandas 通过在匹配条件下堆叠具有值的列来组合数据帧
Pandas combine dataframes by stacking columns with values on matching condition
我想按以下方式合并数据帧 1 和 2:
- 日期栏是一键
- 第二个键是数据帧 1 的 header 和数据帧 2 的项目变量
- 在新数据帧中,V1 对应于这些键匹配的数据帧 1 中的值
- 如果键与 S1 不匹配,则 S2 和 S3 值为空(例如第 0 行)
- 如果键与 S1、S2、S3 值匹配,则从数据帧 2(例如第 1、2 和 3 行)连接值
我尝试了堆叠组合来获得这些结果,但我无法成功,有什么想法吗?
**Dataframe 1**
Date C0 C1 C2 C3
0 2021-03-24 2547.502499 220.815585 91.2 10.764182
1 2021-02-01 2147.502499 219.815585 62.2 8.764182
**Dataframe 2**
Project Date S1 S2 S3
0 C1 2021-03-24 151.733282 67.2 1.882302
1 C1 2021-02-01 150.1 60.2 0.812302
2 C2 2021-03-24 15.15005 50.9 25.200000
**Expected Result**
Date Project V1 S1 S2 S3
0 2021-03-24 C0 2547.502499 NaN NaN NaN
1 2021-03-24 C1 220.815585 151.733282 67.2 1.882302
2 2021-03-24 C2 62.2 15.15005 50.9 25.200000
3 2021-02-01 C1 219.815585 150.1 60.2 0.812302
...
使用stack
和merge
:
(df1.set_index('Date')
.stack()
.reset_index()
.rename(columns = {'level_1' : 'Project', 0 : 'V1'})
.merge(df2, on = ['Date','Project'], how = 'left')
)
输出:
Date Project V1 S1 S2 S3
-- ---------- --------- ---------- -------- ----- ----------
0 2021-03-24 C0 2547.5 nan nan nan
1 2021-03-24 C1 220.816 151.733 67.2 1.8823
2 2021-03-24 C2 91.2 15.1501 50.9 25.2
3 2021-03-24 C3 10.7642 nan nan nan
4 2021-02-01 C0 2147.5 nan nan nan
5 2021-02-01 C1 219.816 150.1 60.2 0.812302
6 2021-02-01 C2 62.2 nan nan nan
7 2021-02-01 C3 8.76418 nan nan nan
我想按以下方式合并数据帧 1 和 2:
- 日期栏是一键
- 第二个键是数据帧 1 的 header 和数据帧 2 的项目变量
- 在新数据帧中,V1 对应于这些键匹配的数据帧 1 中的值
- 如果键与 S1 不匹配,则 S2 和 S3 值为空(例如第 0 行)
- 如果键与 S1、S2、S3 值匹配,则从数据帧 2(例如第 1、2 和 3 行)连接值
我尝试了堆叠组合来获得这些结果,但我无法成功,有什么想法吗?
**Dataframe 1**
Date C0 C1 C2 C3
0 2021-03-24 2547.502499 220.815585 91.2 10.764182
1 2021-02-01 2147.502499 219.815585 62.2 8.764182
**Dataframe 2**
Project Date S1 S2 S3
0 C1 2021-03-24 151.733282 67.2 1.882302
1 C1 2021-02-01 150.1 60.2 0.812302
2 C2 2021-03-24 15.15005 50.9 25.200000
**Expected Result**
Date Project V1 S1 S2 S3
0 2021-03-24 C0 2547.502499 NaN NaN NaN
1 2021-03-24 C1 220.815585 151.733282 67.2 1.882302
2 2021-03-24 C2 62.2 15.15005 50.9 25.200000
3 2021-02-01 C1 219.815585 150.1 60.2 0.812302
...
使用stack
和merge
:
(df1.set_index('Date')
.stack()
.reset_index()
.rename(columns = {'level_1' : 'Project', 0 : 'V1'})
.merge(df2, on = ['Date','Project'], how = 'left')
)
输出:
Date Project V1 S1 S2 S3
-- ---------- --------- ---------- -------- ----- ----------
0 2021-03-24 C0 2547.5 nan nan nan
1 2021-03-24 C1 220.816 151.733 67.2 1.8823
2 2021-03-24 C2 91.2 15.1501 50.9 25.2
3 2021-03-24 C3 10.7642 nan nan nan
4 2021-02-01 C0 2147.5 nan nan nan
5 2021-02-01 C1 219.816 150.1 60.2 0.812302
6 2021-02-01 C2 62.2 nan nan nan
7 2021-02-01 C3 8.76418 nan nan nan