`cor()` 是否仅适用于数字变量?
Does `cor()` only work for numeric variables?
我对 x
与 y
的相关性很感兴趣。 x
是一个序数(李克特型)变量。 y
是一个连续变量。
但是当我使用 cor(x, y, method = "spearman")
时,我收到一条错误消息
'x' must be numeric`
Spearman 的 rho 不一定要求 x
是数字。所以我想知道如何 运行 这个功能?
set.seed(0)
x <- sample(c("None", "Little", "Often", "Always"), 20, replace = TRUE)
y <- round(runif(length(x), 100, 300))
data <- data.frame(subject=seq_len(length(x)), x, y)
cor(x, y, method = "spearman") # Error: 'x' must be numeric
#data:
subject x y
1 1 Little 255
2 2 None 287
3 3 Always 142
4 4 Often 230
5 5 None 125
6 6 Little 153
7 7 None 177
8 8 Often 103
9 9 Often 176
10 10 Little 274
11 11 Little 168
12 12 Often 196
13 13 Often 220
14 14 None 199
15 15 None 137
16 16 None 265
17 17 Little 234
18 18 Little 259
19 19 Little 122
20 20 Little 245
您可以重新编码值:
data <- data %>% mutate(x2 = recode(x, "None" = 0, "Little" =1 , "Often"=2, "Always"=3))
cor(data$x2, data$y, method = "spearman")
[1] -0.1930743
Spearman 的 rho 确实要求数据是 有序的,哪些字符不是,甚至常规因子也不是(这有点微妙——它们确实有一个顺序是在列出因子水平、绘图等时使用,但不假定此排序具有任何统计意义)。如果 cor()
允许有序因子(factor(..., ordered = TRUE)
或 ordered(...)
,那将是有意义的,但事实并非如此。正如 ?cor
所说:
The inputs must be numeric (as determined by ‘is.numeric’: logical
values are also allowed for historical compatibility): the
‘"kendall"’ and ‘"spearman"’ methods make sense for ordered inputs
but ‘xtfrm’ can be used to find a suitable prior transformation to
numbers.
但是,假设您有一个因子变量 并且 级别的顺序是您想要的,那么在 cor()
中使用 as.integer()
应该可以正常工作. (实际上,xtfrm.factor()
方法只是 as.integer()
的包装器。)
xf <- ordered(x, levels = c("None", "Little", "Often", "Always"))
cor(as.integer(xf), y, method = "spearman")
## or
cor(xtfrm(xf), y, method = "spearman")
我对 x
与 y
的相关性很感兴趣。 x
是一个序数(李克特型)变量。 y
是一个连续变量。
但是当我使用 cor(x, y, method = "spearman")
时,我收到一条错误消息
'x' must be numeric`
Spearman 的 rho 不一定要求 x
是数字。所以我想知道如何 运行 这个功能?
set.seed(0)
x <- sample(c("None", "Little", "Often", "Always"), 20, replace = TRUE)
y <- round(runif(length(x), 100, 300))
data <- data.frame(subject=seq_len(length(x)), x, y)
cor(x, y, method = "spearman") # Error: 'x' must be numeric
#data:
subject x y
1 1 Little 255
2 2 None 287
3 3 Always 142
4 4 Often 230
5 5 None 125
6 6 Little 153
7 7 None 177
8 8 Often 103
9 9 Often 176
10 10 Little 274
11 11 Little 168
12 12 Often 196
13 13 Often 220
14 14 None 199
15 15 None 137
16 16 None 265
17 17 Little 234
18 18 Little 259
19 19 Little 122
20 20 Little 245
您可以重新编码值:
data <- data %>% mutate(x2 = recode(x, "None" = 0, "Little" =1 , "Often"=2, "Always"=3))
cor(data$x2, data$y, method = "spearman")
[1] -0.1930743
Spearman 的 rho 确实要求数据是 有序的,哪些字符不是,甚至常规因子也不是(这有点微妙——它们确实有一个顺序是在列出因子水平、绘图等时使用,但不假定此排序具有任何统计意义)。如果 cor()
允许有序因子(factor(..., ordered = TRUE)
或 ordered(...)
,那将是有意义的,但事实并非如此。正如 ?cor
所说:
The inputs must be numeric (as determined by ‘is.numeric’: logical values are also allowed for historical compatibility): the ‘"kendall"’ and ‘"spearman"’ methods make sense for ordered inputs but ‘xtfrm’ can be used to find a suitable prior transformation to numbers.
但是,假设您有一个因子变量 并且 级别的顺序是您想要的,那么在 cor()
中使用 as.integer()
应该可以正常工作. (实际上,xtfrm.factor()
方法只是 as.integer()
的包装器。)
xf <- ordered(x, levels = c("None", "Little", "Often", "Always"))
cor(as.integer(xf), y, method = "spearman")
## or
cor(xtfrm(xf), y, method = "spearman")