如何更高效地操作和合并嵌套列表中的data.frame?
How to manipulate and merge data.frame in the nested list more efficiently?
我得到了两个 data.frames 的列表作为自定义函数的输出,现在我打算拆分列表中的每个 data.frame 并且我可以相应地获得嵌套列表。但是,我想操纵这个嵌套列表来进行分组和合并。使用嵌套列表有点棘手,我无法按预期操作它们。有谁知道更轻松有效地完成此任务的有用技巧吗?我怎样才能得到我想要的输出?提前致谢
迷你示例:
myList_keep <- list(
hola.keep= data.frame( from=seq(1, by=4, len=15), to=seq(3, by=4, len=15), value=sample(30, 15)),
boo.keep = data.frame( from=seq(3, by=7, len=20), to=seq(6, by=7, len=20), value=sample(30, 20)),
meh.keep = data.frame( from=seq(4, by=8, len=25), to=seq(7, by=8, len=25), value=sample(30, 25))
)
myList_drop <- list(
hola.drop= data.frame( from=seq(11, by=7, len=10), to=seq(23, by=7, len=10), value=sample(15, 10)),
boo.drop = data.frame( from=seq(18, by=5, len=12), to=seq(26, by=5, len=12), value=sample(18, 12)),
meh.drop = data.frame( from=seq(24, by=8, len=15), to=seq(37, by=8, len=15), value=sample(30, 15))
)
我尝试将每个 data.frame 拆分如下:
splt_keep <- lapply(myList_keep, function(ele_) {
res <- split(ele_, ifelse(ele_$value >=10, "above", "below"))
})
splt_drop <- lapply(myList_keep, function(ele_) {
res <- split(ele_, ifelse(ele_$value >=10, "above", "below"))
})
我打算用这种方式操作嵌套列表:
例如,如果我可以有效地操作 splt_keep、splt_drop,那么我可以获得嵌套列表的骨架:
$hola.above
$hola.keep$above
$hola.drop$above
$hola.below
$hola.keep$below
$hola.drop$below
然后,我得到这个格式后,我打算相应地合并它们,所以最终输出格式将是:
$hola
$hola.above
$hola.below
$boo
$boo.above
$boo.below
$meh
$meh.above
$meh.below
如何轻松获得所需的输出?如何更舒适地操作嵌套列表?谁能指出我如何做到这一点?
list
对于结构良好的数据而言,split/bind 是非常低效的结构。这里有一个使用 data.table
的选项:
## I transform lists to a unique data.table
## note that setting idcol=TRUE will create
## a new id column to distinguish the origin of each list
library(data.table)
keep_dt <- rbindlist(myList_keep,idcol=TRUE)
drop_dt <- rbindlist(myList_drop,idcol=TRUE)
DT <- rbind(keep_dt,drop_dt)
## Then I create the new group factor
DT[,gr := ifelse(value>10,"above","below"),.id]
## then to get the "hola" , I just filter the whole tabale
## and I split by the other factor to get the expected output
split(DT[grepl("hola",.id)],DT$gr)
更新
要获得预期的输出:
DT[,.id:= gsub("[.](keep|drop)","",.id)]
by(DT,DT$.id,FUN = function(x)split(x,x$gr))
我得到了两个 data.frames 的列表作为自定义函数的输出,现在我打算拆分列表中的每个 data.frame 并且我可以相应地获得嵌套列表。但是,我想操纵这个嵌套列表来进行分组和合并。使用嵌套列表有点棘手,我无法按预期操作它们。有谁知道更轻松有效地完成此任务的有用技巧吗?我怎样才能得到我想要的输出?提前致谢
迷你示例:
myList_keep <- list(
hola.keep= data.frame( from=seq(1, by=4, len=15), to=seq(3, by=4, len=15), value=sample(30, 15)),
boo.keep = data.frame( from=seq(3, by=7, len=20), to=seq(6, by=7, len=20), value=sample(30, 20)),
meh.keep = data.frame( from=seq(4, by=8, len=25), to=seq(7, by=8, len=25), value=sample(30, 25))
)
myList_drop <- list(
hola.drop= data.frame( from=seq(11, by=7, len=10), to=seq(23, by=7, len=10), value=sample(15, 10)),
boo.drop = data.frame( from=seq(18, by=5, len=12), to=seq(26, by=5, len=12), value=sample(18, 12)),
meh.drop = data.frame( from=seq(24, by=8, len=15), to=seq(37, by=8, len=15), value=sample(30, 15))
)
我尝试将每个 data.frame 拆分如下:
splt_keep <- lapply(myList_keep, function(ele_) {
res <- split(ele_, ifelse(ele_$value >=10, "above", "below"))
})
splt_drop <- lapply(myList_keep, function(ele_) {
res <- split(ele_, ifelse(ele_$value >=10, "above", "below"))
})
我打算用这种方式操作嵌套列表:
例如,如果我可以有效地操作 splt_keep、splt_drop,那么我可以获得嵌套列表的骨架:
$hola.above
$hola.keep$above
$hola.drop$above
$hola.below
$hola.keep$below
$hola.drop$below
然后,我得到这个格式后,我打算相应地合并它们,所以最终输出格式将是:
$hola
$hola.above
$hola.below
$boo
$boo.above
$boo.below
$meh
$meh.above
$meh.below
如何轻松获得所需的输出?如何更舒适地操作嵌套列表?谁能指出我如何做到这一点?
list
对于结构良好的数据而言,split/bind 是非常低效的结构。这里有一个使用 data.table
的选项:
## I transform lists to a unique data.table
## note that setting idcol=TRUE will create
## a new id column to distinguish the origin of each list
library(data.table)
keep_dt <- rbindlist(myList_keep,idcol=TRUE)
drop_dt <- rbindlist(myList_drop,idcol=TRUE)
DT <- rbind(keep_dt,drop_dt)
## Then I create the new group factor
DT[,gr := ifelse(value>10,"above","below"),.id]
## then to get the "hola" , I just filter the whole tabale
## and I split by the other factor to get the expected output
split(DT[grepl("hola",.id)],DT$gr)
更新
要获得预期的输出:
DT[,.id:= gsub("[.](keep|drop)","",.id)]
by(DT,DT$.id,FUN = function(x)split(x,x$gr))