如何仅在相同的列名上合并两个 CSV 文件
How to merge two CSV files only on same column names
假设我有一个这样的数据框:
First <- data.frame(name=rep(c("Clay","Garrett","Addison"),each=3),
test=rep(1:3, 3),
score=c(78, 87, 88, 93, 91, 99, 90, 97, 91))
name test score
1 Clay 1 78
2 Clay 2 87
3 Clay 3 88
4 Garrett 1 93
5 Garrett 2 91
6 Garrett 3 99
7 Addison 1 90
8 Addison 2 97
9 Addison 3 91
还有:
Second <- data.frame(name=rep(c("Jim","Jordan"),each=3),
test =rep(1:3, 2),
color = c("red", "brown", "red", "red", "blue", "green"))
name test color
1 Jim 1 red
2 Jim 2 brown
3 Jim 3 red
4 Jordan 1 red
5 Jordan 2 blue
6 Jordan 3 green
现在我想将第二个数据帧附加到第一个数据帧,这样我就有了:
name test score
1 Clay 1 78
2 Clay 2 87
3 Clay 3 88
4 Garrett 1 93
5 Garrett 2 91
6 Garrett 3 99
7 Addison 1 90
8 Addison 2 97
9 Addison 3 91
10 Jim 1 NA
11 Jim 2 NA
12 Jim 3 NA
13 Jordan 1 NA
14 Jordan 2 NA
15 Jordan 3 NA
所以基本上像 LEFT JOIN 但列明智,所以我只保留第一个数据框中的列,如果在第二个数据框中找不到相同的列,我有 NA该列的值
我们可能需要 bind_rows
,即将第一个数据集 ('First') 与 'Second' 绑定,两个数据集的列名只有 intersect
library(dplyr)
bind_rows(First, Second[intersect(names(First), names(Second))])
-输出
name test score
1 Clay 1 78
2 Clay 2 87
3 Clay 3 88
4 Garrett 1 93
5 Garrett 2 91
6 Garrett 3 99
7 Addison 1 90
8 Addison 2 97
9 Addison 3 91
10 Jim 1 NA
11 Jim 2 NA
12 Jim 3 NA
13 Jordan 1 NA
14 Jordan 2 NA
15 Jordan 3 NA
如果列类型不同,我们可能需要将列设为相同类型或使用 data.table
中的 rbindlist
library(data.table)
rbindlist(list(First, Second[intersect(names(First),
names(Second))]), fill = TRUE)
假设我有一个这样的数据框:
First <- data.frame(name=rep(c("Clay","Garrett","Addison"),each=3),
test=rep(1:3, 3),
score=c(78, 87, 88, 93, 91, 99, 90, 97, 91))
name test score
1 Clay 1 78
2 Clay 2 87
3 Clay 3 88
4 Garrett 1 93
5 Garrett 2 91
6 Garrett 3 99
7 Addison 1 90
8 Addison 2 97
9 Addison 3 91
还有:
Second <- data.frame(name=rep(c("Jim","Jordan"),each=3),
test =rep(1:3, 2),
color = c("red", "brown", "red", "red", "blue", "green"))
name test color
1 Jim 1 red
2 Jim 2 brown
3 Jim 3 red
4 Jordan 1 red
5 Jordan 2 blue
6 Jordan 3 green
现在我想将第二个数据帧附加到第一个数据帧,这样我就有了:
name test score
1 Clay 1 78
2 Clay 2 87
3 Clay 3 88
4 Garrett 1 93
5 Garrett 2 91
6 Garrett 3 99
7 Addison 1 90
8 Addison 2 97
9 Addison 3 91
10 Jim 1 NA
11 Jim 2 NA
12 Jim 3 NA
13 Jordan 1 NA
14 Jordan 2 NA
15 Jordan 3 NA
所以基本上像 LEFT JOIN 但列明智,所以我只保留第一个数据框中的列,如果在第二个数据框中找不到相同的列,我有 NA该列的值
我们可能需要 bind_rows
,即将第一个数据集 ('First') 与 'Second' 绑定,两个数据集的列名只有 intersect
library(dplyr)
bind_rows(First, Second[intersect(names(First), names(Second))])
-输出
name test score
1 Clay 1 78
2 Clay 2 87
3 Clay 3 88
4 Garrett 1 93
5 Garrett 2 91
6 Garrett 3 99
7 Addison 1 90
8 Addison 2 97
9 Addison 3 91
10 Jim 1 NA
11 Jim 2 NA
12 Jim 3 NA
13 Jordan 1 NA
14 Jordan 2 NA
15 Jordan 3 NA
如果列类型不同,我们可能需要将列设为相同类型或使用 data.table
rbindlist
library(data.table)
rbindlist(list(First, Second[intersect(names(First),
names(Second))]), fill = TRUE)