为相同的向量组合获取不同的相关值
Getting different correlation values for the same combination of vectors
为什么我对下面的相同组合得到不同的相关性?
> cor(finalDB[2:6],use="complete.obs")
rocky1Rating rocky2Rating rocky3Rating rocky4Rating rocky5Rating
rocky1Rating 1.0000000 ***0.6476523*** 0.5435555 0.4964198 0.3483168
rocky2Rating 0.6476523 1.0000000 0.7507204 0.6653651 0.5288312
rocky3Rating 0.5435555 0.7507204 1.0000000 0.7284123 0.5897088
rocky4Rating 0.4964198 0.6653651 0.7284123 1.0000000 0.6006595
rocky5Rating 0.3483168 0.5288312 0.5897088 0.6006595 1.0000000
> cor(finalDB[2],finalDB[3],use = "complete.obs")
rocky2Rating
rocky1Rating ***0.6011554***
问题很可能是您的数据集中的 NA 值。当您设置 use="complete.obs"
并将其应用于两列以上时,它仅使用所有这些列都没有丢失的行。如果只想跳过唯一列对的缺失值,请设置 use="pairwise.complete.obs"
。例如
set.seed(15)
mm<-matrix(runif(6*6), nrow=6)
mm[cbind(4:6, 1:3)]<-NA
cor(mm, use="complete.obs")
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1.000000000 0.7577650 0.41079822 0.004065102 -0.9221867 0.86947546
# [2,] 0.757764997 1.0000000 -0.28363801 -0.649441771 -0.4464391 0.98119111
# [3,] 0.410798223 -0.2836380 1.00000000 0.913388689 -0.7314382 -0.09319206
# [4,] 0.004065102 -0.6494418 0.91338869 1.000000000 -0.3904905 -0.49043755
# [5,] -0.922186730 -0.4464391 -0.73143818 -0.390490510 1.0000000 -0.61077597
# [6,] 0.869475459 0.9811911 -0.09319206 -0.490437552 -0.6107760 1.00000000
cor(mm, use="pairwise.complete.obs")
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1.0000000 0.70156571 0.50955114 -0.2663486 -0.7637746 0.7643575
# [2,] 0.7015657 1.00000000 -0.01542302 -0.2882218 -0.5666432 0.1206862
# [3,] 0.5095511 -0.01542302 1.00000000 0.8922900 -0.8904275 -0.5660903
# [4,] -0.2663486 -0.28822185 0.89229002 1.0000000 -0.4693979 -0.7574680
# [5,] -0.7637746 -0.56664323 -0.89042748 -0.4693979 1.0000000 0.2974870
# [6,] 0.7643575 0.12068622 -0.56609027 -0.7574680 0.2974870 1.0000000
cor(mm[,1], mm[,2], use="complete.obs")
# [1] 0.7015657
注意最后两个结果是如何匹配的。阅读 ?cor
帮助页面了解更多信息。
为什么我对下面的相同组合得到不同的相关性?
> cor(finalDB[2:6],use="complete.obs")
rocky1Rating rocky2Rating rocky3Rating rocky4Rating rocky5Rating
rocky1Rating 1.0000000 ***0.6476523*** 0.5435555 0.4964198 0.3483168
rocky2Rating 0.6476523 1.0000000 0.7507204 0.6653651 0.5288312
rocky3Rating 0.5435555 0.7507204 1.0000000 0.7284123 0.5897088
rocky4Rating 0.4964198 0.6653651 0.7284123 1.0000000 0.6006595
rocky5Rating 0.3483168 0.5288312 0.5897088 0.6006595 1.0000000
> cor(finalDB[2],finalDB[3],use = "complete.obs")
rocky2Rating
rocky1Rating ***0.6011554***
问题很可能是您的数据集中的 NA 值。当您设置 use="complete.obs"
并将其应用于两列以上时,它仅使用所有这些列都没有丢失的行。如果只想跳过唯一列对的缺失值,请设置 use="pairwise.complete.obs"
。例如
set.seed(15)
mm<-matrix(runif(6*6), nrow=6)
mm[cbind(4:6, 1:3)]<-NA
cor(mm, use="complete.obs")
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1.000000000 0.7577650 0.41079822 0.004065102 -0.9221867 0.86947546
# [2,] 0.757764997 1.0000000 -0.28363801 -0.649441771 -0.4464391 0.98119111
# [3,] 0.410798223 -0.2836380 1.00000000 0.913388689 -0.7314382 -0.09319206
# [4,] 0.004065102 -0.6494418 0.91338869 1.000000000 -0.3904905 -0.49043755
# [5,] -0.922186730 -0.4464391 -0.73143818 -0.390490510 1.0000000 -0.61077597
# [6,] 0.869475459 0.9811911 -0.09319206 -0.490437552 -0.6107760 1.00000000
cor(mm, use="pairwise.complete.obs")
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1.0000000 0.70156571 0.50955114 -0.2663486 -0.7637746 0.7643575
# [2,] 0.7015657 1.00000000 -0.01542302 -0.2882218 -0.5666432 0.1206862
# [3,] 0.5095511 -0.01542302 1.00000000 0.8922900 -0.8904275 -0.5660903
# [4,] -0.2663486 -0.28822185 0.89229002 1.0000000 -0.4693979 -0.7574680
# [5,] -0.7637746 -0.56664323 -0.89042748 -0.4693979 1.0000000 0.2974870
# [6,] 0.7643575 0.12068622 -0.56609027 -0.7574680 0.2974870 1.0000000
cor(mm[,1], mm[,2], use="complete.obs")
# [1] 0.7015657
注意最后两个结果是如何匹配的。阅读 ?cor
帮助页面了解更多信息。