BestNormalize 给出了误导性的结果?
BestNormalize gives misleading results?
我想知道为什么我每次 运行 这段代码都会得到不同的结果:
# arcsinh transformation
(arcsinh_obj <- arcsinh_x(df$Var1))
# Box Cox's Transformation
(boxcox_obj <- boxcox(df$Var1))
# Yeo-Johnson's Transformation
(yeojohnson_obj <- yeojohnson(df$Var1))
# orderNorm Transformation
(orderNorm_obj <- orderNorm(df$Var1))
# Pick the best one automatically
(BNobject <- bestNormalize(df$Var1))
# Last resort - binarize
(binarize_obj <- binarize(df$Var1))
summary(df$Var1)
xx <- seq(min(12), max(56), length = 295)
plot(xx, predict(arcsinh_obj, newdata = xx), type = "l", col = 1, ylim = c(-4, 4),
xlab = 'df$Var1', ylab = "g(df$Var1)")
lines(xx, predict(boxcox_obj, newdata = xx), col = 2)
lines(xx, predict(yeojohnson_obj, newdata = xx), col = 3)
lines(xx, predict(orderNorm_obj, newdata = xx), col = 4)
legend("bottomright", legend = c("arcsinh", "Box Cox", "Yeo-Johnson", "OrderNorm"),
col = 1:4, lty = 1, bty = 'n')
par(mfrow = c(2,2))
MASS::truehist(arcsinh_obj$x.t, main = "Arcsinh transformation", nbins = 100)
MASS::truehist(boxcox_obj$x.t, main = "Box Cox transformation", nbins = 100)
MASS::truehist(yeojohnson_obj$x.t, main = "Yeo-Johnson transformation", nbins = 100)
MASS::truehist(orderNorm_obj$x.t, main = "orderNorm transformation", nbins = 100)
par(mfrow = c(1,2))
MASS::truehist(BNobject$x.t,
main = paste("Best Transformation:",
class(BNobject$chosen_transform)[1]), nbins = 100)
plot(xx, predict(BNobject, newdata = xx), type = "l", col = 1,
main = "Best Normalizing transformation", ylab = "g(x)", xlab = "x")
dev.off()
boxplot(log10(BNobject$oos_preds), yaxt = 'n')
axis(2, at=log10(c(.1,.5, 1, 2, 5, 10)), labels=c(.1,.5, 1, 2, 5, 10))
我什至在每次重新 运行 分析时都尝试这样做,以防他的实际影响
rm(list = ls())
你能帮帮我吗?
谢谢
律
您可能会得到不同的结果,因为 bestNormalize()
函数使用重复的交叉验证(并且不会自动设置种子),因此每次 运行 的结果都会略有不同.
尝试设置种子(例如 set.seed(3)
)。
或者,您可以通过设置out_of_sample = FALSE
告诉函数不执行重复CV,或者通过设置loo = TRUE
使用留一法CV。
我想知道为什么我每次 运行 这段代码都会得到不同的结果:
# arcsinh transformation
(arcsinh_obj <- arcsinh_x(df$Var1))
# Box Cox's Transformation
(boxcox_obj <- boxcox(df$Var1))
# Yeo-Johnson's Transformation
(yeojohnson_obj <- yeojohnson(df$Var1))
# orderNorm Transformation
(orderNorm_obj <- orderNorm(df$Var1))
# Pick the best one automatically
(BNobject <- bestNormalize(df$Var1))
# Last resort - binarize
(binarize_obj <- binarize(df$Var1))
summary(df$Var1)
xx <- seq(min(12), max(56), length = 295)
plot(xx, predict(arcsinh_obj, newdata = xx), type = "l", col = 1, ylim = c(-4, 4),
xlab = 'df$Var1', ylab = "g(df$Var1)")
lines(xx, predict(boxcox_obj, newdata = xx), col = 2)
lines(xx, predict(yeojohnson_obj, newdata = xx), col = 3)
lines(xx, predict(orderNorm_obj, newdata = xx), col = 4)
legend("bottomright", legend = c("arcsinh", "Box Cox", "Yeo-Johnson", "OrderNorm"),
col = 1:4, lty = 1, bty = 'n')
par(mfrow = c(2,2))
MASS::truehist(arcsinh_obj$x.t, main = "Arcsinh transformation", nbins = 100)
MASS::truehist(boxcox_obj$x.t, main = "Box Cox transformation", nbins = 100)
MASS::truehist(yeojohnson_obj$x.t, main = "Yeo-Johnson transformation", nbins = 100)
MASS::truehist(orderNorm_obj$x.t, main = "orderNorm transformation", nbins = 100)
par(mfrow = c(1,2))
MASS::truehist(BNobject$x.t,
main = paste("Best Transformation:",
class(BNobject$chosen_transform)[1]), nbins = 100)
plot(xx, predict(BNobject, newdata = xx), type = "l", col = 1,
main = "Best Normalizing transformation", ylab = "g(x)", xlab = "x")
dev.off()
boxplot(log10(BNobject$oos_preds), yaxt = 'n')
axis(2, at=log10(c(.1,.5, 1, 2, 5, 10)), labels=c(.1,.5, 1, 2, 5, 10))
我什至在每次重新 运行 分析时都尝试这样做,以防他的实际影响
rm(list = ls())
你能帮帮我吗?
谢谢 律
您可能会得到不同的结果,因为 bestNormalize()
函数使用重复的交叉验证(并且不会自动设置种子),因此每次 运行 的结果都会略有不同.
尝试设置种子(例如 set.seed(3)
)。
或者,您可以通过设置out_of_sample = FALSE
告诉函数不执行重复CV,或者通过设置loo = TRUE
使用留一法CV。