数字变量和逻辑变量之间的相关性给出（预期的）错误？

Question

示例数据。

require(data.table)
dt <- data.table(rnorm(10), rnorm(10) < 0.5)

计算数值变量和逻辑变量之间的相关性会出错。

cor(dt)
#Error in cor(dt) : 'x' must be numeric

但是在转换为数据框时错误消失了。

cor(data.frame(dt))
#           V1         V2
#V1  1.0000000 -0.1631356
#V2 -0.1631356  1.0000000

这是 data.table 的预期行为吗？

Answer 1

cor 测试 x 或 y（参数）是否为 data.frames（使用 is.data.frame - data.table 将 return TRUE）然后将参数强制转换为矩阵

if (is.data.frame(x)) x <- as.matrix(x)

问题似乎是 as.matrix.data.table 和 as.matrix.data.frame 使用示例矩阵的不同方式

as.matrix(dt)

returns 一个字符矩阵 - 这似乎是 data.table

中的错误

as.matrix.data.table 和 as.matrix.data.frame 似乎共享相似的强制代码，但调度不同

# data.table:::as.matrix.data.table
else if (non.numeric) {
        for (j in seq_len(p)) {
            if (is.character(X[[j]])) 
                next
            xj <- X[[j]]
            miss <- is.na(xj)
            xj <- if (length(levels(xj))) 
                as.vector(xj)
            else format(xj)
            is.na(xj) <- miss
            X[[j]] <- xj
        }
    }
## base::as.matrix.data.frame
else if (non.numeric) {
    for (j in pseq) {
        if (is.character(X[[j]])) 
            next
        xj <- X[[j]]
        miss <- is.na(xj)
        xj <- if (length(levels(xj))) 
            as.vector(xj)
        else format(xj)
        is.na(xj) <- miss
        X[[j]] <- xj
    }
}

目前 data.table 版本将逻辑列强制转换为一个字符。

Answer 2

这个错误，#1083, is now fixed in level v1.9.5 with commit #1797。

require(data.table)
set.seed(45L)
dt <- data.table(rnorm(10), rnorm(10) < 0.5)
#             V1    V2
#  1:  0.3407997  TRUE
#  2: -0.7033403  TRUE
#  3: -0.3795377 FALSE
#  4: -0.7460474 FALSE
#  5: -0.8981073  TRUE
#  6: -0.3347941  TRUE
#  7: -0.5013782  TRUE
#  8: -0.1745357  TRUE
#  9:  1.8090374 FALSE
# 10: -0.2301050  TRUE
as.matrix(dt)
#               V1 V2
#  [1,]  0.3407997  1
#  [2,] -0.7033403  1
#  [3,] -0.3795377  0
#  [4,] -0.7460474  0
#  [5,] -0.8981073  1
#  [6,] -0.3347941  1
#  [7,] -0.5013782  1
#  [8,] -0.1745357  1
#  [9,]  1.8090374  0
# [10,] -0.2301050  1

数字变量和逻辑变量之间的相关性给出（预期的）错误？

Correlation between numeric and logical variable gives (intended) error?

r

data.table