`cor()` 是否仅适用于数字变量?

Does `cor()` only work for numeric variables?

我对 xy 的相关性很感兴趣。 x 是一个序数(李克特型)变量。 y 是一个连续变量。

但是当我使用 cor(x, y, method = "spearman") 时,我收到一条错误消息

'x' must be numeric`

Spearman 的 rho 不一定要求 x 是数字。所以我想知道如何 运行 这个功能?

set.seed(0)

x <- sample(c("None", "Little", "Often", "Always"), 20, replace = TRUE)
y <- round(runif(length(x), 100, 300))
data <- data.frame(subject=seq_len(length(x)), x, y)

cor(x, y, method = "spearman") # Error: 'x' must be numeric

#data:
   subject      x   y
1        1 Little 255
2        2   None 287
3        3 Always 142
4        4  Often 230
5        5   None 125
6        6 Little 153
7        7   None 177
8        8  Often 103
9        9  Often 176
10      10 Little 274
11      11 Little 168
12      12  Often 196
13      13  Often 220
14      14   None 199
15      15   None 137
16      16   None 265
17      17 Little 234
18      18 Little 259
19      19 Little 122
20      20 Little 245

您可以重新编码值:

data <- data %>% mutate(x2 = recode(x, "None" = 0, "Little" =1 , "Often"=2, "Always"=3))
cor(data$x2, data$y, method = "spearman")
[1] -0.1930743

Spearman 的 rho 确实要求数据是 有序的,哪些字符不是,甚至常规因子也不是(这有点微妙——它们确实有一个顺序是在列出因子水平、绘图等时使用,但不假定此排序具有任何统计意义)。如果 cor() 允许有序因子(factor(..., ordered = TRUE)ordered(...),那将是有意义的,但事实并非如此。正如 ?cor 所说:

The inputs must be numeric (as determined by ‘is.numeric’: logical values are also allowed for historical compatibility): the ‘"kendall"’ and ‘"spearman"’ methods make sense for ordered inputs but ‘xtfrm’ can be used to find a suitable prior transformation to numbers.

但是,假设您有一个因子变量 并且 级别的顺序是您想要的,那么在 cor() 中使用 as.integer() 应该可以正常工作. (实际上,xtfrm.factor() 方法只是 as.integer() 的包装器。)

xf <- ordered(x, levels = c("None", "Little", "Often", "Always"))
cor(as.integer(xf), y, method = "spearman")
## or
cor(xtfrm(xf), y, method = "spearman")