通过 row.names 合并许多不同长度的 R 数据帧

Merge many R data frames by row.names with differing lengths

我有大约 100 个由 row.names 键入的数据帧。我需要将它们全部合并为一个 table,但是有一些缺失,所以长度不同。我设置了这样的测试数据帧:

df1 = data.frame(row.names=c("chr1","chr2","chr3","chr4","chr5"),v1=c(10,43,1,44,598))
df2 = data.frame(row.names=c("chr1","chr2","chr4","chr5","chr6","chr7"),v2=c(6,64,21,98,10,20))
df3 = data.frame(row.names=c("chr2","chr3","chr4","chr5","chr6","chr7"),v3=c(20,30,40,50,60,70))

> df1
      v1
chr1  10
chr2  43
chr3   1
chr4  44
chr5 598
> df2
     v2
chr1  6
chr2 64
chr4 21
chr5 98
chr6 10
chr7 20
> df3
     v3
chr2 20
chr3 30
chr4 40
chr5 50
chr6 60
chr7 70

所需的输出将是:

        v1  v2  v3
chr1    10  6   NA
chr2    43  64  20
chr3    1   NA  30
chr4    44  21  40
chr5    598 98  50
chr6    NA  10  60
chr7    NA  20  70


所以有些方法可以合并 df1、df2、df3、...、dfn。

我们可以将所有数据集放入 list 并使用 mergeReduce 指定 by 作为从行名称创建的新列

lst1 <- lapply(mget(ls(pattern = '^df\d+$')), \(x) 
          transform(x, rn =row.names(x)))
out <- Reduce(function(...) merge(..., by = 'rn', all = TRUE), 
        lst1)
row.names(out) <- out[[1]]
out <- out[-1]

-输出

 out
      v1 v2 v3
chr1  10  6 NA
chr2  43 64 20
chr3   1 NA 30
chr4  44 21 40
chr5 598 98 50
chr6  NA 10 60
chr7  NA 20 70

或者在使用 rownames_to_column(来自 tibble

创建行名称列后,将 tidyversefull_join 结合使用
library(dplyr)
library(tibble)
library(purrr)
mget(ls(pattern = '^df\d+$')) %>%
    map(~ .x %>%
             rownames_to_column('rn')) %>% 
             reduce(full_join, by = 'rn') %>% 
    column_to_rownames("rn")
      v1 v2 v3
chr1  10  6 NA
chr2  43 64 20
chr3   1 NA 30
chr4  44 21 40
chr5 598 98 50
chr6  NA 10 60
chr7  NA 20 70