为什么从数据框中检索到不存在的行名时 R 会出现不一致的行为？

Question

我想知道为什么两个数据框 a 和 b 在检索到不存在的行名时会有不同的结果。例如，

a <- as.data.frame(matrix(1:3, ncol = 1, nrow = 3, dimnames = list(c("A1", "A10", "B"), "V1")))
a
    V1
A1   1
A10  2
B    3

b <- as.data.frame(matrix(4:5, ncol = 1, nrow = 2, dimnames = list(c("A10", "B"), "V1")))
b
    V1
A10  4
B    5

让我们尝试从数据帧 a:

中获取“A10”、“A1”、“A”

> a["A10", 1]
[1] 2
> a["A1", 1]
[1] 1                    # expected
> a["A", 1]
[1] NA                   # expected
> a["B", 1]
[1] 3                    # expected
> a["C", 1]
[1] NA                   # expected

让我们对数据框做同样的事情 b:

> b["A10", 1]
[1] 4
> b["A1", 1]
[1] 4                    # unexpected, should be NA
> b["A", 1]              
[1] 4                    # unexpected, should be NA
> b["B", 1]
[1] 5                    # expected
> b["C", 1]
[1] NA                   # expected

既然 a["A", 1] returns NA，为什么 b["A", 1] 或 b["A1", 1] 不是？

PS。 R 版本 3.5.2

Answer 1

正在综合此处的一些评论...

?`[` 说：

Unlike S (Becker et al p. 358), R never uses partial matching when extracting by [, and partial matching is not by default used by [[ (see argument exact).

但是 ?`[.data.frame` 说：

Both [ and [[ extraction methods partially match row names. By default neither partially match column names, but [[ will if exact = FALSE (and with a warning if exact = NA). If you want to exact matching on row names use match, as in the examples.

那里给出的例子是：

sw <- swiss[1:5, 1:4]
sw["C", ]
##            Fertility Agriculture Examination Education
## Courtelary      80.2          17          15        12

sw[match("C", row.names(sw)), ]
##    Fertility Agriculture Examination Education
## NA        NA          NA          NA        NA

同时：

as.matrix(sw)["C", ]
## Error in as.matrix(sw)["C", ] : subscript out of bounds

因此矩阵的行名完全匹配，而数据帧的行名部分匹配，并且两种行为都被记录下来。

[.data.frame 是用 R 而不是 C 实现的，因此您可以通过打印函数来检查源代码。部分匹配发生在这里：

    if (is.character(i)) {
        rows <- attr(xx, "row.names")
        i <- pmatch(i, rows, duplicates.ok = TRUE)
    }

Bugzilla 上刚好有 a recent thread 关于数据框行名的部分匹配。（暂无讨论...）

关于字符索引，[.data.frame 与 [ 的行为不匹配绝对令人惊讶。

为什么从数据框中检索到不存在的行名时 R 会出现不一致的行为？

why does R have inconsistent behaviors when a non-existent rowname is retrieved from a data frame?

r

dataframe

rowname