主成分分析抛出 constant/zero 列错误
Principal Component Analysis throws constant/zero column Error
我正在尝试 运行 在下面创建的 "training1" 数据集上进行 PCA:
library(AppliedPredictiveModeling); data(AlzheimerDisease); library(caret)
adData <- data.frame(diagnosis, predictors)
inTrain <- createDataPartition(y = adData$diagnosis, p = .75)[[1]]
training <- adData[inTrain, ]
keep <- subset(data.frame(x = substr(as.character(colnames(training)), 1, 2), y = c(1:ncol(training))), x == "IL")
training1 <- cbind(training[, c(keep[1, 2]:keep[nrow(keep), 2])], training[c("diagnosis")])
然后,当我运行下面的函数时:
preProc <- preProcess(log10(training1[, -13]+1), method = "pca", pcaComp = 2)
我收到以下错误:
Warning in preProcess.default(log10(training1[, -13] + 1), method = "pca", :
Std. deviations could not be computed for: IL_1alpha, IL_3
Error in prcomp.default(x[, method$pca, drop = FALSE], scale = TRUE, retx = FALSE) :
cannot rescale a constant/zero column to unit variance
但是,我然后 运行 运行 以下两个函数来证明可以为它说不能计算的两个变量计算标准差:
sd(training1$IL_1alpha)
[1] 0.4056147
sd(training1$IL_3)
[1] 0.5235212
然后运行下面的函数来证明我没有任何方差为零的变量。
nsv <- nearZeroVar(training1, saveMetrics = TRUE)
> print(nsv)
freqRatio percentUnique zeroVar nzv
IL_11 1.250000 29.4820717 FALSE FALSE
IL_13 1.052632 6.7729084 FALSE FALSE
IL_16 1.117647 21.9123506 FALSE FALSE
IL_17E 1.238095 16.7330677 FALSE FALSE
IL_1alpha 1.208333 23.1075697 FALSE FALSE
IL_3 1.066667 24.7011952 FALSE FALSE
IL_4 1.315789 19.1235060 FALSE FALSE
IL_5 1.000000 19.5219124 FALSE FALSE
IL_6 1.000000 20.3187251 FALSE FALSE
IL_6_Receptor 1.041667 21.5139442 FALSE FALSE
IL_7 1.611111 18.7250996 FALSE FALSE
IL_8 1.000000 22.3107570 FALSE FALSE
diagnosis 2.637681 0.7968127 FALSE FALSE
其他人关于 R 中 PCA 的问题似乎是零方差列,但既然我可以证明我在这里没有那个问题,有什么想法可能导致这个问题吗?
抱歉,我没有代表发表评论,所以作为答案发布,但是在 运行 你的代码之后,特别是这一行:
log10(training1[, -13]+1)
returns NaN
某些列中的值(IL_1alpha
和 IL_3
实际上):
Warning messages:
1: In lapply(X = x, FUN = .Generic, ...) : NaNs produced
所以这似乎是错误的来源。也许您不应该取负数的对数并考虑其他转换(或者是否有必要)?
我正在尝试 运行 在下面创建的 "training1" 数据集上进行 PCA:
library(AppliedPredictiveModeling); data(AlzheimerDisease); library(caret)
adData <- data.frame(diagnosis, predictors)
inTrain <- createDataPartition(y = adData$diagnosis, p = .75)[[1]]
training <- adData[inTrain, ]
keep <- subset(data.frame(x = substr(as.character(colnames(training)), 1, 2), y = c(1:ncol(training))), x == "IL")
training1 <- cbind(training[, c(keep[1, 2]:keep[nrow(keep), 2])], training[c("diagnosis")])
然后,当我运行下面的函数时:
preProc <- preProcess(log10(training1[, -13]+1), method = "pca", pcaComp = 2)
我收到以下错误:
Warning in preProcess.default(log10(training1[, -13] + 1), method = "pca", :
Std. deviations could not be computed for: IL_1alpha, IL_3
Error in prcomp.default(x[, method$pca, drop = FALSE], scale = TRUE, retx = FALSE) :
cannot rescale a constant/zero column to unit variance
但是,我然后 运行 运行 以下两个函数来证明可以为它说不能计算的两个变量计算标准差:
sd(training1$IL_1alpha)
[1] 0.4056147
sd(training1$IL_3)
[1] 0.5235212
然后运行下面的函数来证明我没有任何方差为零的变量。
nsv <- nearZeroVar(training1, saveMetrics = TRUE)
> print(nsv)
freqRatio percentUnique zeroVar nzv
IL_11 1.250000 29.4820717 FALSE FALSE
IL_13 1.052632 6.7729084 FALSE FALSE
IL_16 1.117647 21.9123506 FALSE FALSE
IL_17E 1.238095 16.7330677 FALSE FALSE
IL_1alpha 1.208333 23.1075697 FALSE FALSE
IL_3 1.066667 24.7011952 FALSE FALSE
IL_4 1.315789 19.1235060 FALSE FALSE
IL_5 1.000000 19.5219124 FALSE FALSE
IL_6 1.000000 20.3187251 FALSE FALSE
IL_6_Receptor 1.041667 21.5139442 FALSE FALSE
IL_7 1.611111 18.7250996 FALSE FALSE
IL_8 1.000000 22.3107570 FALSE FALSE
diagnosis 2.637681 0.7968127 FALSE FALSE
其他人关于 R 中 PCA 的问题似乎是零方差列,但既然我可以证明我在这里没有那个问题,有什么想法可能导致这个问题吗?
抱歉,我没有代表发表评论,所以作为答案发布,但是在 运行 你的代码之后,特别是这一行:
log10(training1[, -13]+1)
returns NaN
某些列中的值(IL_1alpha
和 IL_3
实际上):
Warning messages:
1: In lapply(X = x, FUN = .Generic, ...) : NaNs produced
所以这似乎是错误的来源。也许您不应该取负数的对数并考虑其他转换(或者是否有必要)?