将长度 > 1 的 group_by 写入 R 中的各个文本文件
Writing out group_by with length > 1 to individual text files in R
抱歉,我仍在熟悉 dplyr 和 data.table 的世界,并试图了解它的全部功能!
我有一个数据集,我对根据特定变量(基因座)进行分组感兴趣:
DF <- structure(list(Gene = c("GeneA", "GeneB", "GeneC", "GeneD", "GeneE"),
Locus = c("1","2","2","3","3"),
Chromosome = c("1","1","1","1","1"),
Start = c("100","500","600","1000","1500"),
Stop = c("200","550","700","1400","1750")),
.Names = c("Gene","Locus","Chromosome","Start","Stop"),
row.names = c(NA, 5L),
class = "data.frame")
> DF
Gene Locus Chromosome Start Stop
GeneA 1 1 100 200
GeneB 2 1 500 550
GeneC 2 1 600 700
GeneD 3 1 1000 1400
GeneE 3 1 1500 1750
我想知道在基因座列有多个值的情况下,是否可以写出包含基因、染色体、起始、停止列值的“每个基因座”文件。所以 Locus==1 不会写出任何文本文件,但是 Locus==2 和 Locus==3 的 Gene 列中的值会写入单独的文件?
例如
<loc2.txt>
Gene Chromosome Start Stop
GeneB 1 500 550
GeneC 1 600 700
<loc3.txt>
Gene Chromosome Start Stop
GeneD 1 1000 1400
GeneE 1 1500 1750
在此先感谢您的帮助!
dplyr
library(dplyr)
newDF <- DF %>%
group_by(Locus) %>%
filter(n() > 1) %>%
nest_by()
newDF
# # A tibble: 2 x 2
# # Rowwise: Locus
# Locus data
# <chr> <list<tbl_df[,4]>>
# 1 2 [2 x 4]
# 2 3 [2 x 4]
mapply(function(x, nm) write.csv(x, nm),
newDF$data, paste0("loc", newDF$Locus, ".csv"))
# [[1]]
# NULL
# [[2]]
# NULL
文件在当前目录中创建。您可以安全地忽略 mapply
.
的 NULL
输出
data.table
library(data.table)
DT <- as.data.table(DF)
newDT <- DT[, .SD[.N > 1, .(data = list(.SD))], by = Locus]
newDT
# Locus data
# <char> <list>
# 1: 2 <data.table[2x4]>
# 2: 3 <data.table[2x4]>
mapply(function(x, nm) write.csv(x, nm),
newDF$data, paste0("loc", newDF$Locus, ".csv"))
抱歉,我仍在熟悉 dplyr 和 data.table 的世界,并试图了解它的全部功能!
我有一个数据集,我对根据特定变量(基因座)进行分组感兴趣:
DF <- structure(list(Gene = c("GeneA", "GeneB", "GeneC", "GeneD", "GeneE"),
Locus = c("1","2","2","3","3"),
Chromosome = c("1","1","1","1","1"),
Start = c("100","500","600","1000","1500"),
Stop = c("200","550","700","1400","1750")),
.Names = c("Gene","Locus","Chromosome","Start","Stop"),
row.names = c(NA, 5L),
class = "data.frame")
> DF
Gene Locus Chromosome Start Stop
GeneA 1 1 100 200
GeneB 2 1 500 550
GeneC 2 1 600 700
GeneD 3 1 1000 1400
GeneE 3 1 1500 1750
我想知道在基因座列有多个值的情况下,是否可以写出包含基因、染色体、起始、停止列值的“每个基因座”文件。所以 Locus==1 不会写出任何文本文件,但是 Locus==2 和 Locus==3 的 Gene 列中的值会写入单独的文件? 例如
<loc2.txt>
Gene Chromosome Start Stop
GeneB 1 500 550
GeneC 1 600 700
<loc3.txt>
Gene Chromosome Start Stop
GeneD 1 1000 1400
GeneE 1 1500 1750
在此先感谢您的帮助!
dplyr
library(dplyr)
newDF <- DF %>%
group_by(Locus) %>%
filter(n() > 1) %>%
nest_by()
newDF
# # A tibble: 2 x 2
# # Rowwise: Locus
# Locus data
# <chr> <list<tbl_df[,4]>>
# 1 2 [2 x 4]
# 2 3 [2 x 4]
mapply(function(x, nm) write.csv(x, nm),
newDF$data, paste0("loc", newDF$Locus, ".csv"))
# [[1]]
# NULL
# [[2]]
# NULL
文件在当前目录中创建。您可以安全地忽略 mapply
.
NULL
输出
data.table
library(data.table)
DT <- as.data.table(DF)
newDT <- DT[, .SD[.N > 1, .(data = list(.SD))], by = Locus]
newDT
# Locus data
# <char> <list>
# 1: 2 <data.table[2x4]>
# 2: 3 <data.table[2x4]>
mapply(function(x, nm) write.csv(x, nm),
newDF$data, paste0("loc", newDF$Locus, ".csv"))