如何对同一样本中的多个文件应用一个函数并将它们组合起来?
How to apply a function on multiple files from same sample and combine them?
我在目录 Data
中有很多示例文件。大约有80个样本。每个样本有 3 个不同格式的输出文件,以样本名称作为前缀。设置如下所示。我在这里只展示了几个示例文件。
Data
|___ PFRT001_disions.tsv
|___ PFRT001_predictions.tsv
|___ PFRT001_tool.beans.results.gz
|___ PFRT007_disions.tsv
|___ PFRT007_predictions.tsv
|___ PFRT007_tool.beans.results.gz
|___ PFRT009_disions.tsv
|___ PFRT009_predictions.tsv
|___ PFRT009_tool.beans.results.gz
|___ PFRT023_disions.tsv
|___ PFRT023_predictions.tsv
|___ PFRT023_tool.beans.results.gz
|___ PFRT098_disions.tsv
|___ PFRT098_predictions.tsv
|___ PFRT098_tool.beans.results.gz
现在,我在每个样本(3 个不同格式的文件)上应用一个函数,并用一个名称保存函数的输出。
这里我展示了我是如何在样本上应用这个函数的PFRT001
sample1 <- single_sample(disionsfile = file.path("/path/to/directory","Data","PFRT001_disions.tsv"),
predictionsfile = file.path("/path/to/directory","Data","PFRT001_predictions.tsv"),
expFile = file.path("/path/to/directory","Data","PFRT001_tool.beans.results.gz"),
tumorID = "PFRT001",
Filter = FALSE)
然后我又 运行 在样本上使用相同的函数 PFRT007
sample2 <- single_sample(disionsfile = file.path("/path/to/directory","Data","PFRT007_disions.tsv"),
predictionsfile = file.path("/path/to/directory","Data","PFRT007_predictions.tsv"),
expFile = file.path("/path/to/directory","Data","PFRT007_tool.beans.results.gz"),
tumorID = "PFRT007",
Filter = FALSE)
我在每个样本(3 个文件)上单独应用此功能并将其保存为一个名称,然后将它们组合如下:
All <- do.call("rbind", list(sample1, sample2))
我有 80 个样本,每个样本有 3 个文件,如上所示。如何将上述功能应用于多个样本文件,并以不同的名称保存每个样本的输出? rbind
所有输出?我想在 R
完成。感谢任何帮助。
假设您的函数 single_sample
returns a data.frame
, map_dfr
是最适合您的函数:
library(stringr)
library(purrr)
file_names <- list.files(path = "/path/to/directory/Data")
unique_names <- unique(str_extract(file_names, ".+?(?=_)"))
all_data <- unique_names %>%
map_dfr(~single_sample(
disionsfile = file.path("/path/to/directory","Data",paste0(.x, "_disions.tsv")),
predictionsfile = file.path("/path/to/directory","Data", paste0(.x, "P_predictions.tsv")),
expFile = file.path("/path/to/directory","Data",paste0(.x, "_tool.beans.results.gz")),
tumorID = .x,
Filter = FALSE
), .id = "sample")
.id = "sample"
添加一个名为 sample
的列,其中包含 .x
数据来自
的信息
我在目录 Data
中有很多示例文件。大约有80个样本。每个样本有 3 个不同格式的输出文件,以样本名称作为前缀。设置如下所示。我在这里只展示了几个示例文件。
Data
|___ PFRT001_disions.tsv
|___ PFRT001_predictions.tsv
|___ PFRT001_tool.beans.results.gz
|___ PFRT007_disions.tsv
|___ PFRT007_predictions.tsv
|___ PFRT007_tool.beans.results.gz
|___ PFRT009_disions.tsv
|___ PFRT009_predictions.tsv
|___ PFRT009_tool.beans.results.gz
|___ PFRT023_disions.tsv
|___ PFRT023_predictions.tsv
|___ PFRT023_tool.beans.results.gz
|___ PFRT098_disions.tsv
|___ PFRT098_predictions.tsv
|___ PFRT098_tool.beans.results.gz
现在,我在每个样本(3 个不同格式的文件)上应用一个函数,并用一个名称保存函数的输出。
这里我展示了我是如何在样本上应用这个函数的PFRT001
sample1 <- single_sample(disionsfile = file.path("/path/to/directory","Data","PFRT001_disions.tsv"),
predictionsfile = file.path("/path/to/directory","Data","PFRT001_predictions.tsv"),
expFile = file.path("/path/to/directory","Data","PFRT001_tool.beans.results.gz"),
tumorID = "PFRT001",
Filter = FALSE)
然后我又 运行 在样本上使用相同的函数 PFRT007
sample2 <- single_sample(disionsfile = file.path("/path/to/directory","Data","PFRT007_disions.tsv"),
predictionsfile = file.path("/path/to/directory","Data","PFRT007_predictions.tsv"),
expFile = file.path("/path/to/directory","Data","PFRT007_tool.beans.results.gz"),
tumorID = "PFRT007",
Filter = FALSE)
我在每个样本(3 个文件)上单独应用此功能并将其保存为一个名称,然后将它们组合如下:
All <- do.call("rbind", list(sample1, sample2))
我有 80 个样本,每个样本有 3 个文件,如上所示。如何将上述功能应用于多个样本文件,并以不同的名称保存每个样本的输出? rbind
所有输出?我想在 R
完成。感谢任何帮助。
假设您的函数 single_sample
returns a data.frame
, map_dfr
是最适合您的函数:
library(stringr)
library(purrr)
file_names <- list.files(path = "/path/to/directory/Data")
unique_names <- unique(str_extract(file_names, ".+?(?=_)"))
all_data <- unique_names %>%
map_dfr(~single_sample(
disionsfile = file.path("/path/to/directory","Data",paste0(.x, "_disions.tsv")),
predictionsfile = file.path("/path/to/directory","Data", paste0(.x, "P_predictions.tsv")),
expFile = file.path("/path/to/directory","Data",paste0(.x, "_tool.beans.results.gz")),
tumorID = .x,
Filter = FALSE
), .id = "sample")
.id = "sample"
添加一个名为 sample
的列,其中包含 .x
数据来自