如何对同一样本中的多个文件应用一个函数并将它们组合起来?

How to apply a function on multiple files from same sample and combine them?

我在目录 Data 中有很多示例文件。大约有80个样本。每个样本有 3 个不同格式的输出文件,以样本名称作为前缀。设置如下所示。我在这里只展示了几个示例文件。

Data
 |___ PFRT001_disions.tsv
 |___ PFRT001_predictions.tsv
 |___ PFRT001_tool.beans.results.gz
 |___ PFRT007_disions.tsv
 |___ PFRT007_predictions.tsv
 |___ PFRT007_tool.beans.results.gz
 |___ PFRT009_disions.tsv
 |___ PFRT009_predictions.tsv
 |___ PFRT009_tool.beans.results.gz
 |___ PFRT023_disions.tsv
 |___ PFRT023_predictions.tsv
 |___ PFRT023_tool.beans.results.gz
 |___ PFRT098_disions.tsv
 |___ PFRT098_predictions.tsv
 |___ PFRT098_tool.beans.results.gz

现在,我在每个样本(3 个不同格式的文件)上应用一个函数,并用一个名称保存函数的输出。

这里我展示了我是如何在样本上应用这个函数的PFRT001

sample1 <- single_sample(disionsfile = file.path("/path/to/directory","Data","PFRT001_disions.tsv"),
  predictionsfile = file.path("/path/to/directory","Data","PFRT001_predictions.tsv"),
  expFile = file.path("/path/to/directory","Data","PFRT001_tool.beans.results.gz"),
  tumorID = "PFRT001",
  Filter = FALSE)

然后我又 运行 在样本上使用相同的函数 PFRT007

sample2 <- single_sample(disionsfile = file.path("/path/to/directory","Data","PFRT007_disions.tsv"),
      predictionsfile = file.path("/path/to/directory","Data","PFRT007_predictions.tsv"),
      expFile = file.path("/path/to/directory","Data","PFRT007_tool.beans.results.gz"),
      tumorID = "PFRT007",
      Filter = FALSE)

我在每个样本(3 个文件)上单独应用此功能并将其保存为一个名称,然后将它们组合如下:

All <- do.call("rbind", list(sample1, sample2))

我有 80 个样本,每个样本有 3 个文件,如上所示。如何将上述功能应用于多个样本文件,并以不同的名称保存每个样本的输出? rbind 所有输出?我想在 R 完成。感谢任何帮助。

假设您的函数 single_sample returns a data.frame, map_dfr 是最适合您的函数:

library(stringr)
library(purrr)

file_names <- list.files(path = "/path/to/directory/Data")
unique_names <- unique(str_extract(file_names, ".+?(?=_)"))

all_data <- unique_names %>% 
  map_dfr(~single_sample(
    disionsfile = file.path("/path/to/directory","Data",paste0(.x, "_disions.tsv")),
    predictionsfile = file.path("/path/to/directory","Data", paste0(.x, "P_predictions.tsv")),
    expFile = file.path("/path/to/directory","Data",paste0(.x, "_tool.beans.results.gz")),
    tumorID = .x,
    Filter = FALSE
  ), .id = "sample")

.id = "sample" 添加一个名为 sample 的列,其中包含 .x 数据来自

的信息