了解列表行为

Question

我觉得我对 data.frames 及其工作原理有很好的理解，但列表的某些方面让我感到困惑。

这里是一些可重现的数据：

 list_a <- structure(list(`one` = structure(list(
     words = c("a", "b","c", "d", "e", "f")), .Names = "words", class = "data.frame", row.names = c(NA,-6L)), 
     `two` = structure(list(words = c("a","s","t","z")), .Names = "words", class = "data.frame", row.names = c(NA, -4L))),
     .Names = c("one", "two"))

这给了我们：

list_a
$one
  words
1     a
2     b
3     c
4     d
5     e
6     f

$two
  words
1     a
2     s
3     t
4     z

现在我想遍历列表 return data.frames 中的一些结果。

list <- list()

for(i in list_a){list <- append(list, list_a$i$words)}

这不会在列表中产生任何结果。也没有：

for(i in list_a){list <- append(list, list_a[[i]]$words)}
Error in list_a[[i]] : invalid subscript type 'list'

我想也许我的第一个循环不起作用的原因是我在使用 list_a$i$words 时没有将 i 定义为正确的名称。所以我尝试了：

for(i in names(list_a)){list <- append(list, list_a$i$words)}

这仍然给我一个长度为 0 的列表。

所以我不明白为什么我尝试的尝试没有给出我预期的结果，我不知道为什么使用下标给我一个错误，最后我找到了正确的语法：

for(i in list_a){list2 <- append(list2, i$words)}

但是我不知道为什么这在使用名称方法时不起作用？

Answer 1

R 中 for 表达式的参数包括：

LHS，一个将获取 RHS 的每个值的迭代器
in，一个语言关键字
RHS，一个向量，其长度定义了将发生的迭代次数。

设置第一个循环时，RHS 是 "list" 类型的长度为 2 的向量。在 LHS 上，您有 i 这是一个单列数据框。然后，您要求 $ 从 list_a 中提取 "i"，其计算结果为 NULL。在您的第二个循环中，RHS 是 "character" 类型的长度为 2 的向量。同样的事情发生了。

$ 不评估其索引。使用 [[ 代替，您将在第二个循环中得到您期望的答案。

# initialize
list <- list()
# loop
for (i in names(list_a)) {
    list <- append(list, list_a[[i]]$words)
}
list
# [[1]]
# [1] "a"
#
# [[2]]
# [1] "b"
# ...

正如 Roland 所提到的，在 R 中附加是非常昂贵的，因为每次迭代都会创建对象的新副本。这里有一个可供尝试的替代方案：

# create a data frame using all of list_a, 
# coerce to character vector
# then coerce to list
as.list(unname(unlist(do.call(what = "rbind", args = list_a))))

请注意，"data.frame" 对象只是应用了 "data.frame" class 属性的 "list" 对象。因此，当使用 data.frames 和 $ 以及未计算的名称时，您会看到与列表相同的行为。试试这个：

# print mtcars data.frame
mtcars
# set class attribute to NULL
class(mtcars) <- NULL
# mtcars is just a list now :-)
mtcars

编辑：$ 和 [[ 是运算符，这只是意味着它们是可以以特殊方式使用的函数。您也可以像普通函数一样使用它们，将它们的参数传递到圆括号中。

# $ is a function
`$`(list_a, "one")
#   words
# 1     a
# 2     b
# ...

这些函数的行为是不同的。 [[ 需要一个它解释的对象。 $ 需要它试图找到的元素名称。

i <- "one"
# $ is a function, but there is no element "i"
`$`(list_a, i)
# NULL
# [[ is a function, and an element "one" is present
`[[`(list_a, i)
#   words
# 1     a
# 2     b
# ...

了解列表行为

Understanding list behaviour

r

list

nested-lists

dataframe