如何使用基于 nnet::multinom() 模型的 {ggeffects} 获得预测概率图的置信区间?

How to get confidence intervals for predicted probability plot using {ggeffects} based on nnet::multinom() model?

我想在 R 中绘制多项式模型的预测概率,该模型装有 nnet::multinom() 函数。我有对数刻度的数值预测变量。

即使 {ggeffects} 应该与 multinom() 兼容,该图也不会像线性模型那样显示置信区间。

我是 R 和这个社区的新手,如果这个问题非常基础或遗漏了一些重要的问题,我深表歉意。这是一个小例子:

library(tidyverse)
library(nnet)
library(effects)
library(ggeffects)


df <- data.frame(response = c("1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse"),
                 count = c(1000, 2000, 4000, 6000, 10000, 3000, 6000, 5000, 11000))

mod1 <- multinom(response ~ log(count), data = df)
summary(mod1)

effects::effect(mod1, term="log(count)", se=TRUE, confidence.level=.95) %>% plot() # Produces CIs.

ggeffects::ggpredict(mod1, terms = "count") %>% plot() + theme_bw() # No confidence intervals.

如果其他人正在寻找 {ggeffects} 的替代方法,我在寻找解决方案时尝试了这些方法:

使用 effects::effect():有效,包括置信区间,但外观不是那么可定制。

组合 {ggeffects}{effects}:请参阅此 post on R Studio Community,其中将来自 effects 包的置信区间与 ggeffects 组合以创建一个图。我收到错误

Error in FUN(X[[i]], ...) : object 'L' not found

但这对那个人有效。

使用 {MNLpred} 包及其 mnl_pred_ova() : 对我不起作用,我想是因为我的预测变量是对数尺度的。我收到以下错误:

Error in eval(parse(text = paste0("data$", xvari))) : attempt to apply non-function

使用 {DAMisc} 中的 mnlAveEffPlot() 函数:有效,但绘图不像我想要的那样可自定义。

您可以使用 ggeffects::ggemmeans() 执行此操作。

library(tidyverse)
library(ggthemes)
library(nnet)
library(ggeffects) # package version used: v0.16.0

df <- data.frame(response = c("1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse"),
                 count = c(1000, 2000, 4000, 6000, 10000, 3000, 6000, 5000, 11000))

mod1 <- multinom(response ~ log(count),
                 data = df)

ggemmeans(mod1, terms = "count") %>% plot() + ggthemes::theme_tufte()

有关如何使用 {ggeffects} 的更多信息,您可能还想看看 package documentation and especially at the differences between the ggemmeans() and ggpredict() etc. (e.g. here)。

{ggeffects} 包利用了 {effects} 创建的输出,但我相信这就是您正在寻找的,它使使用标准 ggplot 命令自定义绘图变得更加容易。

MNLpred 包无法处理回归函数中的 log(),但在您预先计算对数尺度时可以使用。

# Packages
library(tidyverse)
library(nnet)
library(MASS)
library(MNLpred)
library(scales)
library(ggeffects)
library(ggthemes)

df <- data.frame(response = c("1 Better", "1 Better", "1 Better", 
                              "2 Medium", "2 Medium", "2 Medium", 
                              "3 Worse", "3 Worse", "3 Worse"),
                 count = c(1000, 2000, 4000, 
                           6000, 10000, 3000, 
                           6000, 5000, 11000))

mod1 <- multinom(response ~ log(count), data = df)
summary(mod1)

# Log-scaled
df$count_log <- log(df$count)

# Regression
mod2 <- multinom(response ~ count_log,
                 data = df,
                 Hess = TRUE)

# The models are identical:
coef(mod1) == coef(mod2)

完成此步骤后,您可以使用 mnl_pred_ovamnl_fd2_ova 函数来预测概率或第一个 differences/predicted 边际效应。

# 10 steps for predictions
steps <- (max(df$count_log) - min(df$count_log))/9

pred1 <- mnl_pred_ova(mod2,
                      data = df,
                      by = steps,
                      x = "count_log")

x_breaks <- seq(from = min(df$count_log),
                to = max(df$count_log), 
                length.out = 5)

x_labels <- seq(from = min(df$count),
                to = max(df$count),
                length.out = 5)

pred1$plotdata %>%
  ggplot(aes(x = count_log,
             y = mean,
             ymin = lower,
             ymax = upper)) +
  facet_wrap(. ~ response) +
  geom_line() +
  geom_ribbon(alpha = 0.2) +
  scale_y_continuous(labels = percent_format()) +
  scale_x_continuous(breaks = x_breaks,
                     labels = x_labels) +
  theme_bw()

或预测的边际效应:

pred_fd <- mnl_fd2_ova(model = mod2,
                       x = "count_log",
                       value1 = min(df$count_log),
                       value2 = max(df$count_log),
                       data = df)

pred_fd$plotdata_fd %>%
  ggplot(aes(x = categories,
             y = mean,
             ymin = lower,
             ymax = upper)) +
  geom_pointrange() +
  scale_y_continuous(labels = percent_format()) +
  labs(title = "Predicted effect of Count on responses",
       x = "Categories",
       y = "Predicted marginal effect") +
  theme_bw()