读取多个文件并合并 R 中的唯一行

Read multiple files and combine unique rows in R

我想在目录中读取一组不等行的文本文件并排序,将它们组合成一个矩阵,文件名作为列 headers

例如:

file1.txt

  ID    COUNT
  id1     3
  id5     4

sample2.txt

  ID    COUNT
  id1    5
  id3    6

期望的输出:

  ID  file1  sample2  ....  
  id1  3      5
  id5  4      NA  
  id3  NA     6

我在如何读取文件和创建列表方面取得了一些成就,但对找到唯一性感到震惊

   files <- list.files(path=".", pattern="\.txt")
    samples <- list()
    for (f in files) {
            file <- read.table(f,header=F, sep="\t")
            ...

如何在文件列表中使用 sapply 来查找所有文件中的唯一行?

library(reshape2)

# Read all the files into a list of data frames
df.list = lapply(files, function(file) {
  dat = read.table(file, sep="\t")
  dat$file = file
  return(dat)
}

# Combine into a single data frame
df = do.call(rbind, df.list)

# Reshape from long to wide
df = dcast(df, ID ~ file)

或者,如果您追求性能:

library(data.table)
process = function(files){
    files = setNames(files, substr(files, 1L, nchar(files) - 4L))
    dt = rbindlist(lapply(files, fread), idcol = "file")
    dcast(dt, ID ~ file, value.var = "COUNT")
}
files = list.files(path=".", pattern="\.txt")
process(files)
#    ID file1 sample2
#1: id1     3       5
#2: id3    NA       6
#3: id5     4      NA