多列左连接

Question

我习惯将 dplyr 与 R 一起使用，我会做类似的事情

library(dplyr)
mtcars2=mtcars
mtcars3 = mtcars %>% left_join(mtcars2[,c("mpg","vs","hp")], by =c("mpg",'hp') )

# what this does is I do a left join with multiple columns and then bring over only *1* additional column.  This means that mtcars3 only has one additional field - a duplicated 'vs'

我不知道如何使用 pd.merge 来做同样的事情。我想按两列加入，然后仅第三列 - 不是加入的每一列 table 除了 join-bys 如果有意义的话

import pandas as pd
mtcars = pd.read_csv('mtcars.csv')
mtcars2=mtcars

mtcars3  = pd.merge(mtcars, mtcars2['vs','hp','mpg'],how='left', on = ['mpg','hp'])

Answer 1

IIUC 您可以通过添加 [] 并省略 mtcars2 来使用子集 - 您可以再次使用 mtcars：

import pandas as pd
mtcars = pd.read_csv('mtcars.csv')
mtcars3  = pd.merge(mtcars, mtcars[['vs','hp','mpg']], how='left', on = ['mpg','hp'])

样本：

import pandas as pd

mtcars = pd.DataFrame({'vs':[1,2,3],
                       'hp':[1,1,1],
                       'mpg':[7,7,9],
                       'aaa':[1,3,5]})

print (mtcars)
   aaa  hp  mpg  vs
0    1   1    7   1
1    3   1    7   2
2    5   1    9   3

mtcars3  = pd.merge(mtcars, mtcars[['vs','hp','mpg']], how='left', on = ['mpg','hp'])
print (mtcars3)
   aaa  hp  mpg  vs_x  vs_y
0    1   1    7     1     1
1    1   1    7     1     2
2    3   1    7     2     1
3    3   1    7     2     2
4    5   1    9     3     3

Answer 2

这是一个迟到的答案，但如果您熟悉 R/dplyr，这里有一种与 python 类似的方法：

>>> from datar.datasets import mtcars
>>> from datar.all import f, left_join, select
>>> mtcars >> left_join(mtcars >> select(f[f.mpg:f.hp:1]), by=[f.mpg, f.hp])
         mpg   cyl_x    disp_x      hp      drat        wt      qsec      vs      am    gear    carb   cyl_y    disp_y
   <float64> <int64> <float64> <int64> <float64> <float64> <float64> <int64> <int64> <int64> <int64> <int64> <float64>
0       21.0       6     160.0     110      3.90     2.620     16.46       0       1       4       4       6     160.0
1       21.0       6     160.0     110      3.90     2.620     16.46       0       1       4       4       6     160.0
2       21.0       6     160.0     110      3.90     2.875     17.02       0       1       4       4       6     160.0
3       21.0       6     160.0     110      3.90     2.875     17.02       0       1       4       4       6     160.0
... ...

我是 datar 包的作者。

多列左连接

Left join on multiple columns

python

merge

left-join

pandas