如何根据 R 中其他文件中的名称获取文件名?
How to get the filenames based on names in other file in R?
我有一个文件目录数据,其中包含如下文件:
data
|___ UPA.csv
|___ M_B.csv
|___ M_C.csv
|___ M_D.csv
|___ M_E.csv
UPA.csv
如下所示:
Genes
AC018653.3
AC022509.1
AC022509.2
AC055720.2
AC082651.1
AC084346.2
AC084824.4
AC092171.4
AC092803.2
M_B.csv
喜欢以下:
AC084346.2
AD097808.3
AC084824.4
ADFR3564.8
A1982983.4
M_C.csv
喜欢下面:
AC098789.3
AC022509.2
AC783546.3
AC055720.2
M_D.csv
喜欢下面:
AC018653.3
AS989473.9
AC022509.1
AE378467.1
我想检查 UPA.csv
中的 Genes
中的哪些也在其他文件中找到。并想获取文件名。
我希望输出如下所示:
M_B.csv: AC084346.2, AC084824.4
M_C.csv: AC022509.2, AC055720.2
M_D.csv: AC018653.3, AC022509.1
为此我试过如下:
setwd("/data/")
library(tidyverse)
library(magrittr)
genes <- Sys.glob(file.path("M_*.csv"))
genes.read <- lapply(genes,function(x) read.delim(x, header = FALSE))
genes.read <- lapply(genes.read, function(x) set_colnames(x, "Genes"))
ref2 <- list.files(pattern = "UP")
ref2
ref.read <- read.delim(ref2[[1]])
intersect <- lapply(seq_along(genes.read), function(x)
intersect(genes.read[[x]], ref.read))
for(i in 1:length(genes.read)) {
cat(genes[[i]],":",intersect[[i]]$Genes, "\n")
}
上面的代码只给出了文件名,没有基因:
M_B.csv:
M_C.csv
M_D.csv:
尝试以下操作:
UPA <- read.csv('UPA.csv')
filenames <- list.files(pattern = 'M_.*\.csv$')
do.call(rbind, lapply(filenames, function(x) {
data <- read.delim(x, header = FALSE)
names(data) <- 'Genes'
cbind(file = x, subset(data, Genes %in% UPA$Genes))
})) -> result
使用 tidyverse
你可以做同样的事情:
library(tidyverse)
map_df(filenames, function(x) {
read.delim(x, header = FALSE) %>%
setNames('Genes') %>%
filter(Genes %in% UPA$Genes) %>%
mutate(file = x)
}) -> result
这应该让您输出如下内容:
result
# Genes file
#1 AC084346.2 M_B.csv
#2 AC084824.4 M_B.csv
#3 AC022509.2 M_C.csv
#4 AC055720.2 M_C.csv
#...
我有一个文件目录数据,其中包含如下文件:
data
|___ UPA.csv
|___ M_B.csv
|___ M_C.csv
|___ M_D.csv
|___ M_E.csv
UPA.csv
如下所示:
Genes
AC018653.3
AC022509.1
AC022509.2
AC055720.2
AC082651.1
AC084346.2
AC084824.4
AC092171.4
AC092803.2
M_B.csv
喜欢以下:
AC084346.2
AD097808.3
AC084824.4
ADFR3564.8
A1982983.4
M_C.csv
喜欢下面:
AC098789.3
AC022509.2
AC783546.3
AC055720.2
M_D.csv
喜欢下面:
AC018653.3
AS989473.9
AC022509.1
AE378467.1
我想检查 UPA.csv
中的 Genes
中的哪些也在其他文件中找到。并想获取文件名。
我希望输出如下所示:
M_B.csv: AC084346.2, AC084824.4
M_C.csv: AC022509.2, AC055720.2
M_D.csv: AC018653.3, AC022509.1
为此我试过如下:
setwd("/data/")
library(tidyverse)
library(magrittr)
genes <- Sys.glob(file.path("M_*.csv"))
genes.read <- lapply(genes,function(x) read.delim(x, header = FALSE))
genes.read <- lapply(genes.read, function(x) set_colnames(x, "Genes"))
ref2 <- list.files(pattern = "UP")
ref2
ref.read <- read.delim(ref2[[1]])
intersect <- lapply(seq_along(genes.read), function(x)
intersect(genes.read[[x]], ref.read))
for(i in 1:length(genes.read)) {
cat(genes[[i]],":",intersect[[i]]$Genes, "\n")
}
上面的代码只给出了文件名,没有基因:
M_B.csv:
M_C.csv
M_D.csv:
尝试以下操作:
UPA <- read.csv('UPA.csv')
filenames <- list.files(pattern = 'M_.*\.csv$')
do.call(rbind, lapply(filenames, function(x) {
data <- read.delim(x, header = FALSE)
names(data) <- 'Genes'
cbind(file = x, subset(data, Genes %in% UPA$Genes))
})) -> result
使用 tidyverse
你可以做同样的事情:
library(tidyverse)
map_df(filenames, function(x) {
read.delim(x, header = FALSE) %>%
setNames('Genes') %>%
filter(Genes %in% UPA$Genes) %>%
mutate(file = x)
}) -> result
这应该让您输出如下内容:
result
# Genes file
#1 AC084346.2 M_B.csv
#2 AC084824.4 M_B.csv
#3 AC022509.2 M_C.csv
#4 AC055720.2 M_C.csv
#...