data.table + ecdf - 未定义的列

data.table + ecdf - undefined column

我正在与 data.table 合作。从 data.table 对象 select 一列很容易:

> head(data.table(mtcars)[,2])
   cyl
1:   6
2:   6
3:   4
4:   6
5:   8
6:   6

但是在 ecdf 调用中使用此语法尝试 select 列会产生错误:

> ecdf(data.table(mtcars)[,2])(data.table(mtcars)[,2])

Error in [.data.frame(x, i) : undefined columns selected

谁能解释一下为什么好吗?

从实用的角度来说,解决这个问题的一种方法是:

> ecdf(data.table(mtcars)[[2]])(data.table(mtcars)[[2]])
 [1] 0.56250 0.56250 0.34375 0.56250 1.00000 0.56250 1.00000 0.34375 0.34375 0.56250 0.56250 1.00000 1.00000 1.00000 1.00000 1.00000
[17] 1.00000 0.34375 0.34375 0.34375 0.34375 1.00000 1.00000 1.00000 1.00000 0.34375 0.34375 0.34375 1.00000 0.56250 1.00000 0.34375

但我想了解上面的行为。

原因在提取。在第一种情况下,它仍然是一个data.table,而在第二种情况下它是一个vector

data.table(mtcars)[[2]]
#[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

data.tabledata.frame 语法略有不同。 IN data.table[,默认使用drop = TRUE。因此,使用 , 并仅选择一个列将降低维度成为 vector

data.table-faq

中也提到了

For consistency so that when you use data.table in functions that accept varying inputs, you can rely on DT[...] returning a data.table. You don’t have to remember to include drop=FALSE like you do in data.frame. data.table was first released in 2006 and this difference to data.frame has been a feature since the very beginning.