BestNormalize 给出了误导性的结果?

BestNormalize gives misleading results?

我想知道为什么我每次 运行 这段代码都会得到不同的结果:

# arcsinh transformation
(arcsinh_obj <- arcsinh_x(df$Var1))
# Box Cox's Transformation
(boxcox_obj <- boxcox(df$Var1))
# Yeo-Johnson's Transformation
(yeojohnson_obj <- yeojohnson(df$Var1))
# orderNorm Transformation
(orderNorm_obj <- orderNorm(df$Var1))
# Pick the best one automatically
(BNobject <- bestNormalize(df$Var1))
# Last resort - binarize
(binarize_obj <- binarize(df$Var1))

summary(df$Var1)
xx <- seq(min(12), max(56), length = 295)

plot(xx, predict(arcsinh_obj, newdata = xx), type = "l", col = 1, ylim = c(-4, 4),
     xlab = 'df$Var1', ylab = "g(df$Var1)")
lines(xx, predict(boxcox_obj, newdata = xx), col = 2)
lines(xx, predict(yeojohnson_obj, newdata = xx), col = 3)
lines(xx, predict(orderNorm_obj, newdata = xx), col = 4)

legend("bottomright", legend = c("arcsinh", "Box Cox", "Yeo-Johnson", "OrderNorm"), 
       col = 1:4, lty = 1, bty = 'n')

par(mfrow = c(2,2))
MASS::truehist(arcsinh_obj$x.t, main = "Arcsinh transformation", nbins = 100)
MASS::truehist(boxcox_obj$x.t, main = "Box Cox transformation", nbins = 100)
MASS::truehist(yeojohnson_obj$x.t, main = "Yeo-Johnson transformation", nbins = 100)
MASS::truehist(orderNorm_obj$x.t, main = "orderNorm transformation", nbins = 100)

par(mfrow = c(1,2))
MASS::truehist(BNobject$x.t, 
               main = paste("Best Transformation:", 
                            class(BNobject$chosen_transform)[1]), nbins = 100)
plot(xx, predict(BNobject, newdata = xx), type = "l", col = 1, 
     main = "Best Normalizing transformation", ylab = "g(x)", xlab = "x")

dev.off()
boxplot(log10(BNobject$oos_preds), yaxt = 'n')
axis(2, at=log10(c(.1,.5, 1, 2, 5, 10)), labels=c(.1,.5, 1, 2, 5, 10))

我什至在每次重新 运行 分析时都尝试这样做,以防他的实际影响

 rm(list = ls())

你能帮帮我吗?

谢谢 律

您可能会得到不同的结果,因为 bestNormalize() 函数使用重复的交叉验证(并且不会自动设置种子),因此每次 运行 的结果都会略有不同.

尝试设置种子(例如 set.seed(3))。

或者,您可以通过设置out_of_sample = FALSE告诉函数不执行重复CV,或者通过设置loo = TRUE使用留一法CV。