多列左连接
Left join on multiple columns
我习惯将 dplyr 与 R 一起使用,我会做类似的事情
library(dplyr)
mtcars2=mtcars
mtcars3 = mtcars %>% left_join(mtcars2[,c("mpg","vs","hp")], by =c("mpg",'hp') )
# what this does is I do a left join with multiple columns and then bring over only *1* additional column. This means that mtcars3 only has one additional field - a duplicated 'vs'
我不知道如何使用 pd.merge 来做同样的事情。
我想按两列加入,然后 仅 第三列 - 不是加入的每一列 table 除了 join-bys 如果有意义的话
import pandas as pd
mtcars = pd.read_csv('mtcars.csv')
mtcars2=mtcars
mtcars3 = pd.merge(mtcars, mtcars2['vs','hp','mpg'],how='left', on = ['mpg','hp'])
IIUC 您可以通过添加 []
并省略 mtcars2
来使用子集 - 您可以再次使用 mtcars
:
import pandas as pd
mtcars = pd.read_csv('mtcars.csv')
mtcars3 = pd.merge(mtcars, mtcars[['vs','hp','mpg']], how='left', on = ['mpg','hp'])
样本:
import pandas as pd
mtcars = pd.DataFrame({'vs':[1,2,3],
'hp':[1,1,1],
'mpg':[7,7,9],
'aaa':[1,3,5]})
print (mtcars)
aaa hp mpg vs
0 1 1 7 1
1 3 1 7 2
2 5 1 9 3
mtcars3 = pd.merge(mtcars, mtcars[['vs','hp','mpg']], how='left', on = ['mpg','hp'])
print (mtcars3)
aaa hp mpg vs_x vs_y
0 1 1 7 1 1
1 1 1 7 1 2
2 3 1 7 2 1
3 3 1 7 2 2
4 5 1 9 3 3
这是一个迟到的答案,但如果您熟悉 R/dplyr,这里有一种与 python 类似的方法:
>>> from datar.datasets import mtcars
>>> from datar.all import f, left_join, select
>>> mtcars >> left_join(mtcars >> select(f[f.mpg:f.hp:1]), by=[f.mpg, f.hp])
mpg cyl_x disp_x hp drat wt qsec vs am gear carb cyl_y disp_y
<float64> <int64> <float64> <int64> <float64> <float64> <float64> <int64> <int64> <int64> <int64> <int64> <float64>
0 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 6 160.0
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 6 160.0
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 6 160.0
3 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 6 160.0
... ...
我是 datar
包的作者。
我习惯将 dplyr 与 R 一起使用,我会做类似的事情
library(dplyr)
mtcars2=mtcars
mtcars3 = mtcars %>% left_join(mtcars2[,c("mpg","vs","hp")], by =c("mpg",'hp') )
# what this does is I do a left join with multiple columns and then bring over only *1* additional column. This means that mtcars3 only has one additional field - a duplicated 'vs'
我不知道如何使用 pd.merge 来做同样的事情。 我想按两列加入,然后 仅 第三列 - 不是加入的每一列 table 除了 join-bys 如果有意义的话
import pandas as pd
mtcars = pd.read_csv('mtcars.csv')
mtcars2=mtcars
mtcars3 = pd.merge(mtcars, mtcars2['vs','hp','mpg'],how='left', on = ['mpg','hp'])
IIUC 您可以通过添加 []
并省略 mtcars2
来使用子集 - 您可以再次使用 mtcars
:
import pandas as pd
mtcars = pd.read_csv('mtcars.csv')
mtcars3 = pd.merge(mtcars, mtcars[['vs','hp','mpg']], how='left', on = ['mpg','hp'])
样本:
import pandas as pd
mtcars = pd.DataFrame({'vs':[1,2,3],
'hp':[1,1,1],
'mpg':[7,7,9],
'aaa':[1,3,5]})
print (mtcars)
aaa hp mpg vs
0 1 1 7 1
1 3 1 7 2
2 5 1 9 3
mtcars3 = pd.merge(mtcars, mtcars[['vs','hp','mpg']], how='left', on = ['mpg','hp'])
print (mtcars3)
aaa hp mpg vs_x vs_y
0 1 1 7 1 1
1 1 1 7 1 2
2 3 1 7 2 1
3 3 1 7 2 2
4 5 1 9 3 3
这是一个迟到的答案,但如果您熟悉 R/dplyr,这里有一种与 python 类似的方法:
>>> from datar.datasets import mtcars
>>> from datar.all import f, left_join, select
>>> mtcars >> left_join(mtcars >> select(f[f.mpg:f.hp:1]), by=[f.mpg, f.hp])
mpg cyl_x disp_x hp drat wt qsec vs am gear carb cyl_y disp_y
<float64> <int64> <float64> <int64> <float64> <float64> <float64> <int64> <int64> <int64> <int64> <int64> <float64>
0 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 6 160.0
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 6 160.0
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 6 160.0
3 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 6 160.0
... ...
我是 datar
包的作者。