请求帮助理解 tidybayes::add_predicted_draws 和 brms::posterior_predict 之间的明显差异
Request to help understand an apparent discrepancy between tidybayes::add_predicted_draws and brms::posterior_predict
我正在使用 Rethinking 包中的 Howell1 数据集。
require(cmdstanr)
require(brms)
require(tidybayes)
data("Howell1")
d <- Howell1
d2 <- d[d$age > 18,]
d2$hs <- (d2$height - mean(d2$height))/ sd(d2$height)
d2$ws <- (d2$weight - mean(d2$weight))/ sd(d2$weight)
使用一个数值预测变量和一个分类预测变量构建一个简单的 brms 模型
priors <- c(prior(normal(0,2), class = "Intercept"),
prior(normal(0,2), class = 'b'),
prior(cauchy(0,2), class = "sigma"))
m4.4 <- brm(formula = hs ~ 1 + ws + male, data = d2, family = gaussian,
backend = "cmdstanr", prior = priors
iter = 2000, warmup = 1000, chains = 4, cores = 4)
我正在尝试了解 add_fitted_draws 和 add_predicted_draws 的工作原理。
考虑add_fitted_draws:
i <- 4 # looking at the results for a particular row of the input dataset
y <- posterior_epred(m4.4)
x <- d2 %>% add_fitted_draws(model = m4.4, value = "epred")
x %>% as_tibble() %>% filter(.row ==i) %>% dplyr::select(epred) %>% cbind(fitdr = y[,i]) %>% mutate(diff = fitdr - epred)
根据文档 add_fitted_draw 内部使用 posterior_epred 或其在 brms 中的等价物并且结果完全匹配。
现在,当我继续在 add_predicted_draws 和 posterior_predict 之间做完全相同的事情时,结果不匹配
yp <- posterior_predict(m4.4)
xp <- d2 %>% add_predicted_draws(model = m4.4, prediction = "pred")
xp %>% as_tibble() %>% filter(.row ==i) %>% dplyr::select(pred) %>% cbind(preddr = yp[,i]) %>% mutate(diff = preddr - pred)
我的理解肯定有差距,请指教
会话信息如下:
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
other attached packages:
[1] stringr_1.4.0 readr_1.4.0 tibble_3.0.4 tidyverse_1.3.0 MASS_7.3-53 bayesplot_1.8.0 cmdstanr_0.1.3 rethinking_2.13
[9] loo_2.4.1 gganimate_1.0.7 RColorBrewer_1.1-2 ggrepel_0.9.0 brms_2.14.4 Rcpp_1.0.5 rstan_2.21.2 StanHeaders_2.21.0-7
[17] cowplot_1.1.1 ggplot2_3.3.3 tidybayes_2.3.1 ggdist_2.4.0 modelr_0.1.8 tidyr_1.1.2 forcats_0.5.0 purrr_0.3.4
[25] dplyr_1.0.2 magrittr_2.0.1
PS: 这个问题我也发到stan discourse上了,还没收到回复。也请让我知道这是否更好地发布在 stats.stackexchange 中。由于这更多是一个基于工具的问题,而不是一个概念性问题,所以我将其发布在这里。
事实证明,差异是由于 brms::posterior_predict
不遵守种子设置而引起的
在 github 中与 brms 软件包的开发人员讨论后,他导致问题如下:
If you have set options(mc.cores = <more than 1>), posterior_predict
will evaluate in parallel by default, unless you change the core
argument. On windows, parallel execution is done via
parallel::parLapply and I don't know how that function respects seeds,
if at all. When executing the code in serial (with 1 core) the results
are reproducible.
一旦我将 mc.cores 设置为 1,我就不再看到 posterior_predict 和 add_predicted_draws 之间的差异了。
因此我将问题标记为已解决。
相关的 github 链接是:
我正在使用 Rethinking 包中的 Howell1 数据集。
require(cmdstanr)
require(brms)
require(tidybayes)
data("Howell1")
d <- Howell1
d2 <- d[d$age > 18,]
d2$hs <- (d2$height - mean(d2$height))/ sd(d2$height)
d2$ws <- (d2$weight - mean(d2$weight))/ sd(d2$weight)
使用一个数值预测变量和一个分类预测变量构建一个简单的 brms 模型
priors <- c(prior(normal(0,2), class = "Intercept"),
prior(normal(0,2), class = 'b'),
prior(cauchy(0,2), class = "sigma"))
m4.4 <- brm(formula = hs ~ 1 + ws + male, data = d2, family = gaussian,
backend = "cmdstanr", prior = priors
iter = 2000, warmup = 1000, chains = 4, cores = 4)
我正在尝试了解 add_fitted_draws 和 add_predicted_draws 的工作原理。
考虑add_fitted_draws:
i <- 4 # looking at the results for a particular row of the input dataset
y <- posterior_epred(m4.4)
x <- d2 %>% add_fitted_draws(model = m4.4, value = "epred")
x %>% as_tibble() %>% filter(.row ==i) %>% dplyr::select(epred) %>% cbind(fitdr = y[,i]) %>% mutate(diff = fitdr - epred)
根据文档 add_fitted_draw 内部使用 posterior_epred 或其在 brms 中的等价物并且结果完全匹配。
现在,当我继续在 add_predicted_draws 和 posterior_predict 之间做完全相同的事情时,结果不匹配
yp <- posterior_predict(m4.4)
xp <- d2 %>% add_predicted_draws(model = m4.4, prediction = "pred")
xp %>% as_tibble() %>% filter(.row ==i) %>% dplyr::select(pred) %>% cbind(preddr = yp[,i]) %>% mutate(diff = preddr - pred)
我的理解肯定有差距,请指教
会话信息如下:
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
other attached packages:
[1] stringr_1.4.0 readr_1.4.0 tibble_3.0.4 tidyverse_1.3.0 MASS_7.3-53 bayesplot_1.8.0 cmdstanr_0.1.3 rethinking_2.13
[9] loo_2.4.1 gganimate_1.0.7 RColorBrewer_1.1-2 ggrepel_0.9.0 brms_2.14.4 Rcpp_1.0.5 rstan_2.21.2 StanHeaders_2.21.0-7
[17] cowplot_1.1.1 ggplot2_3.3.3 tidybayes_2.3.1 ggdist_2.4.0 modelr_0.1.8 tidyr_1.1.2 forcats_0.5.0 purrr_0.3.4
[25] dplyr_1.0.2 magrittr_2.0.1
PS: 这个问题我也发到stan discourse上了,还没收到回复。也请让我知道这是否更好地发布在 stats.stackexchange 中。由于这更多是一个基于工具的问题,而不是一个概念性问题,所以我将其发布在这里。
事实证明,差异是由于 brms::posterior_predict
不遵守种子设置而引起的在 github 中与 brms 软件包的开发人员讨论后,他导致问题如下:
If you have set options(mc.cores = <more than 1>), posterior_predict will evaluate in parallel by default, unless you change the core argument. On windows, parallel execution is done via parallel::parLapply and I don't know how that function respects seeds, if at all. When executing the code in serial (with 1 core) the results are reproducible.
一旦我将 mc.cores 设置为 1,我就不再看到 posterior_predict 和 add_predicted_draws 之间的差异了。
因此我将问题标记为已解决。
相关的 github 链接是: