具有缺失值的 Cramer's V 给出不同的结果
Cramer's V with missing values gives different results
我的问题涉及计算 Cramers V 以检测分类变量之间的相关性。我有一个包含缺失值的数据集,但我创建了一个假数据集用于说明,其中包含两个变量 a 和 b,其中一个包含 NA。
a <- factor(c("M","","F","F","","M","F","F"))
a2 <- factor(a, levels = c('M','F'),labels =c('Male','Female'))
b <- factor(c("y","y","","y","n","n","n","y"))
b2 <- factor(b, levels=c("y","n"),labels=c("yes","no"))
df<-cbind(a2,b2)
assocstats 函数为我提供了 cramers V 的结果:
require(vcd)
> tab <-table(a,b)
> assocstats(tab)
X^2 df P(> X^2)
Likelihood Ratio 1.7261 4 0.78597
Pearson 1.3333 4 0.85570
Phi-Coefficient : 0.408
Contingency Coeff.: 0.378
Cramer's V : 0.289
现在我想从关卡中删除 NA
a[a==""]<-NA
a3 <- droplevels(a)
levels(a3)
tab <-table(a,b)
assocstats(tab)
但每次我删除 NA 时,结果如下所示:
X^2 df P(> X^2)
Likelihood Ratio 0.13844 2 0.93312
Pearson NaN 2 NaN
Phi-Coefficient : NaN
Contingency Coeff.: NaN
Cramer's V : NaN
此外,因为我有一个大数据集,所以我想计算 Cramer V 结果的矩阵。我在堆栈溢出上发现了这段代码,它似乎可以工作...
get.V<-function(y){
col.y<-ncol(y)
V<-matrix(ncol=col.y,nrow=col.y)
for(i in 1:col.y){
for(j in 1:col.y){
V[i,j]<-assocstats(table(y[,i],y[,j]))$cramer
}
}
return(V)
}
get.V(tab)
只是结果与assocstats函数的结果不同:
[,1] [,2] [,3]
[1,] 1.0 0.5 1
[2,] 0.5 1.0 1
[3,] 1.0 1.0 1
这不可能是对的,因为我每次都会得到这个结果,即使在更改观察次数时也是如此......这段代码有什么问题?
结论:不知道哪个结果是对的。我有一个很大的数据集,里面有很多 NA。第一个 asocstat 结果和代码给出了不同的结果,虽然没有太大区别,因为代码只创建了一个矩阵。第二个 asocstat 函数只给出 NaN.I 无法检测到任何错误...有人可以帮助我吗?
如果您使用因数,则不必将 ""
替换为 NA
——您未在 levels
中定义的任何唯一值都将转换为NA
来自 factor
a <- factor(c("M","","F","F","","M","F","F"))
a2 <- factor(a, levels = c('M','F'),labels =c('Male','Female'))
a
# [1] M F F M F F
# Levels: F M
a2
# [1] Male <NA> Female Female <NA> Male Female Female
# Levels: Male Female
b <- factor(c("y","y","","y","n","n","n","y"))
b2 <- factor(b, levels=c("y","n"),labels=c("yes","no"))
(df <- cbind(a2,b2))
# a2 b2
# [1,] 1 1
# [2,] NA 1
# [3,] 2 NA
# [4,] 2 1
# [5,] NA 2
# [6,] 1 2
# [7,] 2 2
# [8,] 2 1
在上面,您正在创建一个矩阵,它丢失了您使用 factor
创建的所有标签。我想你想要一个数据框:
(df <- data.frame(a2,b2))
# a2 b2
# 1 Male yes
# 2 <NA> yes
# 3 Female <NA>
# 4 Female yes
# 5 <NA> no
# 6 Male no
# 7 Female no
# 8 Female yes
require('vcd')
(tab <- table(a2,b2, useNA = 'ifany'))
# b2
# a2 yes no <NA>
# Male 1 1 0
# Female 2 1 1
# <NA> 1 1 0
(tab <- table(a2,b2))
# b2
# a2 yes no
# Male 1 1
# Female 2 1
如果您想在 table 中看到 NA
值,您需要明确告诉 table
。否则,它会默认删除它们,这样当你使用 assocstats
:
时你已经是 "excluding" 它们了
assocstats(tab)
# X^2 df P(> X^2)
# Likelihood Ratio 0.13844 1 0.70983
# Pearson 0.13889 1 0.70939
#
# Phi-Coefficient : 0.167
# Contingency Coeff.: 0.164
# Cramer's V : 0.167
对于get.V
只传递数据框或矩阵,而不是table:
get.V <- function(y) {
col.y <- ncol(y)
V <- matrix(ncol=col.y,nrow=col.y)
for(i in 1:col.y){
for(j in 1:col.y){
V[i,j] <- assocstats(table(y[,i],y[,j]))$cramer
}
}
return(V)
}
get.V(df)
# [,1] [,2]
# [1,] 1.0000000 0.1666667
# [2,] 0.1666667 1.0000000
我的问题涉及计算 Cramers V 以检测分类变量之间的相关性。我有一个包含缺失值的数据集,但我创建了一个假数据集用于说明,其中包含两个变量 a 和 b,其中一个包含 NA。
a <- factor(c("M","","F","F","","M","F","F"))
a2 <- factor(a, levels = c('M','F'),labels =c('Male','Female'))
b <- factor(c("y","y","","y","n","n","n","y"))
b2 <- factor(b, levels=c("y","n"),labels=c("yes","no"))
df<-cbind(a2,b2)
assocstats 函数为我提供了 cramers V 的结果:
require(vcd)
> tab <-table(a,b)
> assocstats(tab)
X^2 df P(> X^2)
Likelihood Ratio 1.7261 4 0.78597
Pearson 1.3333 4 0.85570
Phi-Coefficient : 0.408
Contingency Coeff.: 0.378
Cramer's V : 0.289
现在我想从关卡中删除 NA
a[a==""]<-NA
a3 <- droplevels(a)
levels(a3)
tab <-table(a,b)
assocstats(tab)
但每次我删除 NA 时,结果如下所示:
X^2 df P(> X^2)
Likelihood Ratio 0.13844 2 0.93312
Pearson NaN 2 NaN
Phi-Coefficient : NaN
Contingency Coeff.: NaN
Cramer's V : NaN
此外,因为我有一个大数据集,所以我想计算 Cramer V 结果的矩阵。我在堆栈溢出上发现了这段代码,它似乎可以工作...
get.V<-function(y){
col.y<-ncol(y)
V<-matrix(ncol=col.y,nrow=col.y)
for(i in 1:col.y){
for(j in 1:col.y){
V[i,j]<-assocstats(table(y[,i],y[,j]))$cramer
}
}
return(V)
}
get.V(tab)
只是结果与assocstats函数的结果不同:
[,1] [,2] [,3]
[1,] 1.0 0.5 1
[2,] 0.5 1.0 1
[3,] 1.0 1.0 1
这不可能是对的,因为我每次都会得到这个结果,即使在更改观察次数时也是如此......这段代码有什么问题?
结论:不知道哪个结果是对的。我有一个很大的数据集,里面有很多 NA。第一个 asocstat 结果和代码给出了不同的结果,虽然没有太大区别,因为代码只创建了一个矩阵。第二个 asocstat 函数只给出 NaN.I 无法检测到任何错误...有人可以帮助我吗?
如果您使用因数,则不必将 ""
替换为 NA
——您未在 levels
中定义的任何唯一值都将转换为NA
来自 factor
a <- factor(c("M","","F","F","","M","F","F"))
a2 <- factor(a, levels = c('M','F'),labels =c('Male','Female'))
a
# [1] M F F M F F
# Levels: F M
a2
# [1] Male <NA> Female Female <NA> Male Female Female
# Levels: Male Female
b <- factor(c("y","y","","y","n","n","n","y"))
b2 <- factor(b, levels=c("y","n"),labels=c("yes","no"))
(df <- cbind(a2,b2))
# a2 b2
# [1,] 1 1
# [2,] NA 1
# [3,] 2 NA
# [4,] 2 1
# [5,] NA 2
# [6,] 1 2
# [7,] 2 2
# [8,] 2 1
在上面,您正在创建一个矩阵,它丢失了您使用 factor
创建的所有标签。我想你想要一个数据框:
(df <- data.frame(a2,b2))
# a2 b2
# 1 Male yes
# 2 <NA> yes
# 3 Female <NA>
# 4 Female yes
# 5 <NA> no
# 6 Male no
# 7 Female no
# 8 Female yes
require('vcd')
(tab <- table(a2,b2, useNA = 'ifany'))
# b2
# a2 yes no <NA>
# Male 1 1 0
# Female 2 1 1
# <NA> 1 1 0
(tab <- table(a2,b2))
# b2
# a2 yes no
# Male 1 1
# Female 2 1
如果您想在 table 中看到 NA
值,您需要明确告诉 table
。否则,它会默认删除它们,这样当你使用 assocstats
:
assocstats(tab)
# X^2 df P(> X^2)
# Likelihood Ratio 0.13844 1 0.70983
# Pearson 0.13889 1 0.70939
#
# Phi-Coefficient : 0.167
# Contingency Coeff.: 0.164
# Cramer's V : 0.167
对于get.V
只传递数据框或矩阵,而不是table:
get.V <- function(y) {
col.y <- ncol(y)
V <- matrix(ncol=col.y,nrow=col.y)
for(i in 1:col.y){
for(j in 1:col.y){
V[i,j] <- assocstats(table(y[,i],y[,j]))$cramer
}
}
return(V)
}
get.V(df)
# [,1] [,2]
# [1,] 1.0000000 0.1666667
# [2,] 0.1666667 1.0000000