通过 row.names 合并许多不同长度的 R 数据帧
Merge many R data frames by row.names with differing lengths
我有大约 100 个由 row.names 键入的数据帧。我需要将它们全部合并为一个 table,但是有一些缺失,所以长度不同。我设置了这样的测试数据帧:
df1 = data.frame(row.names=c("chr1","chr2","chr3","chr4","chr5"),v1=c(10,43,1,44,598))
df2 = data.frame(row.names=c("chr1","chr2","chr4","chr5","chr6","chr7"),v2=c(6,64,21,98,10,20))
df3 = data.frame(row.names=c("chr2","chr3","chr4","chr5","chr6","chr7"),v3=c(20,30,40,50,60,70))
> df1
v1
chr1 10
chr2 43
chr3 1
chr4 44
chr5 598
> df2
v2
chr1 6
chr2 64
chr4 21
chr5 98
chr6 10
chr7 20
> df3
v3
chr2 20
chr3 30
chr4 40
chr5 50
chr6 60
chr7 70
所需的输出将是:
v1 v2 v3
chr1 10 6 NA
chr2 43 64 20
chr3 1 NA 30
chr4 44 21 40
chr5 598 98 50
chr6 NA 10 60
chr7 NA 20 70
所以有些方法可以合并 df1、df2、df3、...、dfn。
我们可以将所有数据集放入 list
并使用 merge
和 Reduce
指定 by
作为从行名称创建的新列
lst1 <- lapply(mget(ls(pattern = '^df\d+$')), \(x)
transform(x, rn =row.names(x)))
out <- Reduce(function(...) merge(..., by = 'rn', all = TRUE),
lst1)
row.names(out) <- out[[1]]
out <- out[-1]
-输出
out
v1 v2 v3
chr1 10 6 NA
chr2 43 64 20
chr3 1 NA 30
chr4 44 21 40
chr5 598 98 50
chr6 NA 10 60
chr7 NA 20 70
或者在使用 rownames_to_column
(来自 tibble
)
创建行名称列后,将 tidyverse
与 full_join
结合使用
library(dplyr)
library(tibble)
library(purrr)
mget(ls(pattern = '^df\d+$')) %>%
map(~ .x %>%
rownames_to_column('rn')) %>%
reduce(full_join, by = 'rn') %>%
column_to_rownames("rn")
v1 v2 v3
chr1 10 6 NA
chr2 43 64 20
chr3 1 NA 30
chr4 44 21 40
chr5 598 98 50
chr6 NA 10 60
chr7 NA 20 70
我有大约 100 个由 row.names 键入的数据帧。我需要将它们全部合并为一个 table,但是有一些缺失,所以长度不同。我设置了这样的测试数据帧:
df1 = data.frame(row.names=c("chr1","chr2","chr3","chr4","chr5"),v1=c(10,43,1,44,598))
df2 = data.frame(row.names=c("chr1","chr2","chr4","chr5","chr6","chr7"),v2=c(6,64,21,98,10,20))
df3 = data.frame(row.names=c("chr2","chr3","chr4","chr5","chr6","chr7"),v3=c(20,30,40,50,60,70))
> df1
v1
chr1 10
chr2 43
chr3 1
chr4 44
chr5 598
> df2
v2
chr1 6
chr2 64
chr4 21
chr5 98
chr6 10
chr7 20
> df3
v3
chr2 20
chr3 30
chr4 40
chr5 50
chr6 60
chr7 70
所需的输出将是:
v1 v2 v3
chr1 10 6 NA
chr2 43 64 20
chr3 1 NA 30
chr4 44 21 40
chr5 598 98 50
chr6 NA 10 60
chr7 NA 20 70
所以有些方法可以合并 df1、df2、df3、...、dfn。
我们可以将所有数据集放入 list
并使用 merge
和 Reduce
指定 by
作为从行名称创建的新列
lst1 <- lapply(mget(ls(pattern = '^df\d+$')), \(x)
transform(x, rn =row.names(x)))
out <- Reduce(function(...) merge(..., by = 'rn', all = TRUE),
lst1)
row.names(out) <- out[[1]]
out <- out[-1]
-输出
out
v1 v2 v3
chr1 10 6 NA
chr2 43 64 20
chr3 1 NA 30
chr4 44 21 40
chr5 598 98 50
chr6 NA 10 60
chr7 NA 20 70
或者在使用 rownames_to_column
(来自 tibble
)
tidyverse
与 full_join
结合使用
library(dplyr)
library(tibble)
library(purrr)
mget(ls(pattern = '^df\d+$')) %>%
map(~ .x %>%
rownames_to_column('rn')) %>%
reduce(full_join, by = 'rn') %>%
column_to_rownames("rn")
v1 v2 v3
chr1 10 6 NA
chr2 43 64 20
chr3 1 NA 30
chr4 44 21 40
chr5 598 98 50
chr6 NA 10 60
chr7 NA 20 70