为什么使用单个字符索引的索引适用于数据框而不适用于矩阵？

Question

在数据框中，[-索引可以使用单个字符执行。例如。 mtcars["mpg"].

另一方面，在矩阵上进行相同的尝试，结果为 NA，例如

m = cbind(A = 1:5, B = 1:5)
m["A"]
# NA

...暗示这在某种程度上是对矩阵进行子集化的无效方法。

这是正常的 R 行为吗？如果有，记录在哪里？

Answer 1

这里有两个案例，

m = cbind(A = 1:5, B = 11:15)
typeof(m)
"integer"

和

typeof(mtcars)
"list"

所以读书不一样。第一种情况需要逗号，

cbind(A = 1:5, B = 11:15)[,"A"]
[1] 1 2 3 4 5

Answer 2

默认情况下，

cbind() 创建一个矩阵。 mtcars 是一个数据框。

class(cbind(A = 1:5, B = 1:5))
# [1] "matrix" "array"

class(mtcars)
# [1] "data.frame"

因为数据框被构建为 list 列，dataframe["column_name"]，在 [ 中使用一个参数，默认将数据框视为 list，允许您 select 列，大部分与 dataframe[, "column_name"].

相同

A matrix 没有这样的 list 基础，因此如果您将 [ 与一个参数一起使用，它不会假定您需要列。使用矩阵中的 matrix[, "column_name"] 到 select 列。

cbind 是一种从头开始创建数据框的糟糕方法。可以指定cbind.data.frame(A = 1:5, B = 1:5)，但使用data.frame(A = 1:5, B = 1:5)更简单明了。但是，如果您要向现有数据框添加多列，那么 cbind(my_data_frame, A = 1:5, B = 1:5) 就可以了，只要其中一个参数是 already 数据，就会生成一个数据框帧.

Answer 3

此行为记录在 ?"["“矩阵和数组”部分：

Matrices and arrays are vectors with a dimension attribute and so all the vector forms of indexing can be used with a single index.

这意味着如果您只使用单个索引，则子集对象将被视为没有维度的对象，因此如果索引是字符向量，该方法将查找 names 属性，在这种情况下不存在（在矩阵上尝试 names(m) 来检查）。你在问题中所做的完全等同于(c(1:5, 1:5))["A"]。如果您改用双索引，该方法将搜索 dimnames 属性以进行子集化。即使令人困惑，matrix 也可能同时具有 names 和 dimnames。考虑一下：

m<-matrix(c(1:5,1:5), ncol = 2, dimnames = list(LETTERS[1:5], LETTERS[1:2]))
names(m)<-LETTERS[1:10]
#check whether the attributes are set
str(m)
# int [1:5, 1:2] 1 2 3 4 5 1 2 3 4 5
# - attr(*, "dimnames")=List of 2
#  ..$ : chr [1:5] "A" "B" "C" "D" ...
#  ..$ : chr [1:2] "A" "B"
# - attr(*, "names")= chr [1:10] "A" "B" "C" "D" ...

我们设置了rownames、colnames和names。让我们对其进行子集化：

#a column
m[,"A"]
#A B C D E
#1 2 3 4 5

#a row
m["A",]
# A B 
#1 1

#an element
m["A"]
#A 
#1

为什么使用单个字符索引的索引适用于数据框而不适用于矩阵？

Why does indexing with a single character index work on a data frame but not a matrix?

r

matrix

subset