根据另一个列表提取数据

Question

我正在尝试根据嵌套在个体中的时间点列表提取数据集的行。我有重复的时间点（因此变量值完全相同）但我仍然想保留重复的行。如何在 base R 中实现？

这是原始数据集：

xx <- data.frame(id=rep(1:3, each=3), time=1:3, y=rep(1:3, each=3))

这是矩阵列表，其中第三个是向量

lst <- list(`1` = c(1, 1, 2), `2` = c(1, 3, 3), `3` = c(2, 2, 3))

理想的结果：

id time y
 1    1 1
 1    1 1  #this is the duplicated row
 1    2 1
 2    1 2
 2    3 2
 2    3 2 #this is the duplicated row
 3    2 3
 3    2 3 #this is the duplicated row
 3    3 3

代码 do.call(rbind, Map(function(p, q) subset(xx, id == q & time %in% p), lst, names(lst))) 对我不起作用，因为 subset 删除了重复的行

Answer 1

问题是 %in% 不会重复遍历 non-unique 值。为此，我们还需要在内部对 p 进行迭代 (lapply)。我会将你内心的 subset 包裹在另一个 do.call(rbind, lapply(p, ...)) 中以获得你所期望的：

do.call(rbind, Map(function(p, q) {
  do.call(rbind, lapply(p, function(p0) subset(xx, id == q & time %in% p0))) 
  }, lst, names(lst)))
#      id time y
# 1.1   1    1 1
# 1.2   1    1 1
# 1.21  1    2 1
# 2.4   2    1 2
# 2.6   2    3 2
# 2.61  2    3 2
# 3.8   3    2 3
# 3.81  3    2 3
# 3.9   3    3 3

（行名在这里会让人分心...）

另一种方法是将您的 lst 转换为 id 和 time 的框架，然后在其上 left-join:

frm <- do.call(rbind, Map(function(x, nm) data.frame(id = nm, time = x), lst, names(lst)))
frm
#     id time
# 1.1  1    1
# 1.2  1    1
# 1.3  1    2
# 2.1  2    1
# 2.2  2    3
# 2.3  2    3
# 3.1  3    2
# 3.2  3    2
# 3.3  3    3

merge(frm, xx, by = c("id", "time"), all.x = TRUE)
#   id time y
# 1  1    1 1
# 2  1    1 1
# 3  1    2 1
# 4  2    1 2
# 5  2    3 2
# 6  2    3 2
# 7  3    2 3
# 8  3    2 3
# 9  3    3 3

学习 merges/joins 的两个好资源：

How to join (merge) data frames (inner, outer, left, right)
What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?

根据另一个列表提取数据

Extract data based on another list

r

list

data-manipulation

dataframe

data-cleaning