外部连接表 - 保留描述
Outer Join Tables - Keep Descriptions
new = pd.DataFrame({'table': \['a','b', 'c', 'd'\], 'desc': \['','','',''\], 'total':\[22,22,22,22\]})
old = pd.DataFrame({'table': \['a','b', 'e'\], 'desc': \['foo','foo','foo'\], 'total':\[11,11,11\]})
all = pd.merge(new, old, how='outer', on=\['table', 'total'\])
输出:
table desc_x total desc_y
0 a 22 NaN
1 b 22 NaN
2 c 22 NaN
3 d 22 NaN
4 a NaN 11 foo
期望的输出:
table desc total
0 a foo 22
1 b foo 22
2 c 22
3 d 22
4 a foo 11
我尝试了外连接,但它删除了 a 和 b 的描述。`
- 考虑到您要实现的目标是在 table 和 total 上进行外部联接,这毫无意义。在 table
上更改为外部联接
然后可以修改 - table 以使用您想要的输出和清理列中隐含的首选项
new = pd.DataFrame({'table': ['a','b', 'c', 'd'], 'desc': ['','','',''], 'total':[22,22,22,22]})
old = pd.DataFrame({'table': ['a','b', 'e'], 'desc': ['foo','foo','foo'], 'total':[11,11,11]})
all = pd.merge(new, old, how='outer', on=['table'])
# select prefered columns
all["desc"] = all["desc_x"].replace('', np.nan).fillna(all["desc_y"]).fillna("")
all["total"] = all["total_x"].fillna(all["total_y"])
# clean up columns
all = all.drop(columns=[c for c in all.columns if c[-2:] in ["_x", "_y"]])
all
table
desc
total
0
a
foo
22
1
b
foo
22
2
c
22
3
d
22
4
e
foo
11
new = pd.DataFrame({'table': \['a','b', 'c', 'd'\], 'desc': \['','','',''\], 'total':\[22,22,22,22\]})
old = pd.DataFrame({'table': \['a','b', 'e'\], 'desc': \['foo','foo','foo'\], 'total':\[11,11,11\]})
all = pd.merge(new, old, how='outer', on=\['table', 'total'\])
输出:
table desc_x total desc_y
0 a 22 NaN
1 b 22 NaN
2 c 22 NaN
3 d 22 NaN
4 a NaN 11 foo
期望的输出:
table desc total
0 a foo 22
1 b foo 22
2 c 22
3 d 22
4 a foo 11
我尝试了外连接,但它删除了 a 和 b 的描述。`
- 考虑到您要实现的目标是在 table 和 total 上进行外部联接,这毫无意义。在 table 上更改为外部联接 然后可以修改
- table 以使用您想要的输出和清理列中隐含的首选项
new = pd.DataFrame({'table': ['a','b', 'c', 'd'], 'desc': ['','','',''], 'total':[22,22,22,22]})
old = pd.DataFrame({'table': ['a','b', 'e'], 'desc': ['foo','foo','foo'], 'total':[11,11,11]})
all = pd.merge(new, old, how='outer', on=['table'])
# select prefered columns
all["desc"] = all["desc_x"].replace('', np.nan).fillna(all["desc_y"]).fillna("")
all["total"] = all["total_x"].fillna(all["total_y"])
# clean up columns
all = all.drop(columns=[c for c in all.columns if c[-2:] in ["_x", "_y"]])
all
table | desc | total | |
---|---|---|---|
0 | a | foo | 22 |
1 | b | foo | 22 |
2 | c | 22 | |
3 | d | 22 | |
4 | e | foo | 11 |