如何从多项回归模型中获得多个预测而不是焦点预测(即按因子变量拆分)
How to get multiple predictions rather than a focal prediction from multinomial regression model (i.e., split by factor variable)
我想 运行 多项式回归以获得每个选项的平均频率,除以一个因素(性别:male/female)。
背景
我想比较 4 种奶酪来衡量每种奶酪的受欢迎程度,共有 4 种可能性:切达干酪、马苏里拉干酪、豪达干酪和布里干酪。我出去询问 200 个人他们最喜欢的奶酪。每个人只能从 4 种类型中选择一种。我最终还收集了一些人口统计信息,包括性别、年龄和体重。
收集完数据后,我想看看每种奶酪的受欢迎程度(总和为100%)。由于我想控制 gender
、age
和 weight
,我认为这里适合使用多项式回归。
但我也很想知道男性和女性之间的结果有何不同,我想将 gender
作为一个因素包括在我的模型中。我如何根据我的(多项式)模型生成双重预测,从而分别获得女性和男性的预测值,以便我可以比较两个性别水平?
数据
library(truncnorm)
library(tidyverse)
set.seed(999)
cheese_df <-
tibble(
age = round(rtruncnorm(
n = 200,
a = 20,
b = 80,
mean = 25,
sd = 25.09
)),
cheese_response = as_factor(sample(
c("cheddar", "mozzarella", "gouda", "brie"),
size = 200,
replace = TRUE
)),
gender = sample(c(0, 1), size = 200, replace = TRUE),
weight = rtruncnorm(
n = 200,
a = 40,
b = 120,
mean = 70,
sd = 25.09
)
)
> cheese_df
## # A tibble: 200 x 4
## age cheese_response gender weight
## <dbl> <fct> <dbl> <dbl>
## 1 45 cheddar 0 62.2
## 2 32 cheddar 0 45.0
## 3 58 cheddar 1 87.6
## 4 28 brie 0 68.8
## 5 49 gouda 0 88.2
## 6 29 brie 1 74.5
## 7 49 cheddar 0 74.0
## 8 27 gouda 1 90.3
## 9 28 brie 0 56.5
## 10 48 mozzarella 0 72.9
## # ... with 190 more rows
如果我只想 运行 对年龄、性别和体重进行多项回归和控制**而不按性别划分**我可以这样做:
library(nnet)
library(effects)
fit <- nnet::multinom(cheese_response ~ age + gender + weight, data = cheese_df)
average_person_for_control <-
c(
age = 50,
gender = 0.5,
weight = 75
)
prediction <-
effects::Effect("age",
fit,
given.values = average_person_for_control,
xlevels = list(age =
c(45, 90)))
proportions_for_plot <-
data.frame(prediction$prob, prediction$lower.prob, prediction$upper.prob) %>%
slice(1) %>%
pivot_longer(., cols = everything(),
names_to = c(".value", "response"),
names_pattern = "(.*)\.(.*$)") %>%
rename("lower_ci" = "L.prob",
"upper_ci" = "U.prob",
"estimate" = "prob")
ggplot(proportions_for_plot, aes(x = reorder(response, -estimate), y = estimate)) +
geom_bar(stat = "identity", width = 0.7, fill = "darkgreen") +
geom_errorbar(aes(ymin = lower_ci, ymax = upper_ci),
width = 0.2) +
geom_text(aes(label = paste0(100*round(estimate,2), "%")),
vjust = 1.6,
color = "white", size = 3) +
xlab("cheese type") +
ylab("proportion of people choosing this type")
但是,我对生成相同的条形图感兴趣,只是它会拆分男性和女性的条形图
这就是我想要得到的那种情节
(忽略此演示中的值)
一种方法是按性别对数据进行子集化,运行每个子集上使用相同的模型,生成两个条形图并将它们联合起来。但是,我想将 gender
作为一个因素合并到模型中,然后才输出拆分条形图。这是部分处理,因为 gender
已经是模型的一部分:
fit <- nnet::multinom(cheese_response ~ age + gender + weight, data = cheese_df)
.
Still,就按 性别 拆分预测,以便在条形图中并排比较它们,我运行陷入困境。这是因为 effects::Effect()
在其 given.values
参数中只接受一个向量。否则,我会做类似下面的事情来提供预测(就像我使用 predict
时会做的那样):
control_by_gender <-
expand.grid(
age = 50,
weight = 75,
gender = c(0, 1)
)
> control_by_gender
## age weight gender
## 1 50 75 0
## 2 50 75 1
知道在处理如上所示的多项式模型对象时如何获得这样的多重(而不是焦点)预测吗?我的最终目标是按性别划分的条形图,就像上面的演示一样。我一直在使用 Effects::effect
来生成预测,但我愿意接受任何可以实现多重预测技巧的替代方法。
为什么不直接 lapply 级别进入 effects::Effect
调用
prediction <- do.call(rbind,lapply(0:1, function(x) {
eff <- effects::Effect("age",
fit,
given.values =c(age = 50,
weight = 75,
gender = x),
xlevels = list(age =c(45, 90)))
data.frame(level=x, eff$prob, eff$lower.prob, eff$upper.prob) %>% slice(1)
}))
proportions_for_plot <-
prediction %>%
pivot_longer(., cols = !level,
names_to = c(".value", "response"),
names_pattern = "(.*)\.(.*$)") %>%
rename("lower_ci" = "L.prob",
"upper_ci" = "U.prob",
"estimate" = "prob")
ggplot(proportions_for_plot, aes(x = as.factor(response), y = estimate, fill=factor(level))) +
geom_bar(stat = "identity", width = 0.7,position="dodge") +
geom_errorbar(aes(ymin = lower_ci, ymax = upper_ci), position=position_dodge(.9),
width = 0.2) +
geom_text(aes(label = paste0(100*round(estimate,2), "%")),
vjust = 1.6,
color = "white", size = 3, position=position_dodge(.9)) +
xlab("cheese type") +
ylab("proportion of people choosing this type")
这个答案使用与@Abdessabour Mtk 相同的直觉,只是 purrr::map
和一些重构:
make_eff_df <- function(gender, fit) {
Effect("age", fit, xlevels = list(age = c(45, 90)),
given.values = c(age = 50, weight = 75, gender = gender)) %>%
as_tibble() %>%
mutate(gender = gender) %>%
select(gender, matches("[a-z\.]?prob")) %>%
slice(1)
}
map_dfr(0:1, make_eff_df, fit) %>%
pivot_longer(-gender, names_to = c(".value", "response"),
names_pattern = "(.+)\.(.+$)") %>%
rename(lower_ci = "L.prob", upper_ci = "U.prob", estimate = "prob") %>%
mutate(across(1:2, as.factor)) %>%
ggplot(aes(x = reorder(response, -estimate), y = estimate, fill = gender)) +
geom_bar(stat = "identity", width = 0.7, position = position_dodge(.9)) +
geom_errorbar(aes(ymin = lower_ci, ymax = upper_ci),
position = position_dodge(.9),
width = 0.2) +
geom_text(aes(label = scales::percent(estimate, accuracy = 1)),
vjust = 1.6, color = "white", size = 3, position=position_dodge(.9)) +
labs(x = "cheese type",
y = "proportion of people choosing this type")
我想 运行 多项式回归以获得每个选项的平均频率,除以一个因素(性别:male/female)。
背景
我想比较 4 种奶酪来衡量每种奶酪的受欢迎程度,共有 4 种可能性:切达干酪、马苏里拉干酪、豪达干酪和布里干酪。我出去询问 200 个人他们最喜欢的奶酪。每个人只能从 4 种类型中选择一种。我最终还收集了一些人口统计信息,包括性别、年龄和体重。
收集完数据后,我想看看每种奶酪的受欢迎程度(总和为100%)。由于我想控制 gender
、age
和 weight
,我认为这里适合使用多项式回归。
但我也很想知道男性和女性之间的结果有何不同,我想将 gender
作为一个因素包括在我的模型中。我如何根据我的(多项式)模型生成双重预测,从而分别获得女性和男性的预测值,以便我可以比较两个性别水平?
数据
library(truncnorm)
library(tidyverse)
set.seed(999)
cheese_df <-
tibble(
age = round(rtruncnorm(
n = 200,
a = 20,
b = 80,
mean = 25,
sd = 25.09
)),
cheese_response = as_factor(sample(
c("cheddar", "mozzarella", "gouda", "brie"),
size = 200,
replace = TRUE
)),
gender = sample(c(0, 1), size = 200, replace = TRUE),
weight = rtruncnorm(
n = 200,
a = 40,
b = 120,
mean = 70,
sd = 25.09
)
)
> cheese_df
## # A tibble: 200 x 4
## age cheese_response gender weight
## <dbl> <fct> <dbl> <dbl>
## 1 45 cheddar 0 62.2
## 2 32 cheddar 0 45.0
## 3 58 cheddar 1 87.6
## 4 28 brie 0 68.8
## 5 49 gouda 0 88.2
## 6 29 brie 1 74.5
## 7 49 cheddar 0 74.0
## 8 27 gouda 1 90.3
## 9 28 brie 0 56.5
## 10 48 mozzarella 0 72.9
## # ... with 190 more rows
如果我只想 运行 对年龄、性别和体重进行多项回归和控制**而不按性别划分**我可以这样做:
library(nnet)
library(effects)
fit <- nnet::multinom(cheese_response ~ age + gender + weight, data = cheese_df)
average_person_for_control <-
c(
age = 50,
gender = 0.5,
weight = 75
)
prediction <-
effects::Effect("age",
fit,
given.values = average_person_for_control,
xlevels = list(age =
c(45, 90)))
proportions_for_plot <-
data.frame(prediction$prob, prediction$lower.prob, prediction$upper.prob) %>%
slice(1) %>%
pivot_longer(., cols = everything(),
names_to = c(".value", "response"),
names_pattern = "(.*)\.(.*$)") %>%
rename("lower_ci" = "L.prob",
"upper_ci" = "U.prob",
"estimate" = "prob")
ggplot(proportions_for_plot, aes(x = reorder(response, -estimate), y = estimate)) +
geom_bar(stat = "identity", width = 0.7, fill = "darkgreen") +
geom_errorbar(aes(ymin = lower_ci, ymax = upper_ci),
width = 0.2) +
geom_text(aes(label = paste0(100*round(estimate,2), "%")),
vjust = 1.6,
color = "white", size = 3) +
xlab("cheese type") +
ylab("proportion of people choosing this type")
但是,我对生成相同的条形图感兴趣,只是它会拆分男性和女性的条形图
这就是我想要得到的那种情节
(忽略此演示中的值)
一种方法是按性别对数据进行子集化,运行每个子集上使用相同的模型,生成两个条形图并将它们联合起来。但是,我想将 gender
作为一个因素合并到模型中,然后才输出拆分条形图。这是部分处理,因为 gender
已经是模型的一部分:
fit <- nnet::multinom(cheese_response ~ age + gender + weight, data = cheese_df)
.
Still,就按 性别 拆分预测,以便在条形图中并排比较它们,我运行陷入困境。这是因为 effects::Effect()
在其 given.values
参数中只接受一个向量。否则,我会做类似下面的事情来提供预测(就像我使用 predict
时会做的那样):
control_by_gender <-
expand.grid(
age = 50,
weight = 75,
gender = c(0, 1)
)
> control_by_gender
## age weight gender
## 1 50 75 0
## 2 50 75 1
知道在处理如上所示的多项式模型对象时如何获得这样的多重(而不是焦点)预测吗?我的最终目标是按性别划分的条形图,就像上面的演示一样。我一直在使用 Effects::effect
来生成预测,但我愿意接受任何可以实现多重预测技巧的替代方法。
为什么不直接 lapply 级别进入 effects::Effect
调用
prediction <- do.call(rbind,lapply(0:1, function(x) {
eff <- effects::Effect("age",
fit,
given.values =c(age = 50,
weight = 75,
gender = x),
xlevels = list(age =c(45, 90)))
data.frame(level=x, eff$prob, eff$lower.prob, eff$upper.prob) %>% slice(1)
}))
proportions_for_plot <-
prediction %>%
pivot_longer(., cols = !level,
names_to = c(".value", "response"),
names_pattern = "(.*)\.(.*$)") %>%
rename("lower_ci" = "L.prob",
"upper_ci" = "U.prob",
"estimate" = "prob")
ggplot(proportions_for_plot, aes(x = as.factor(response), y = estimate, fill=factor(level))) +
geom_bar(stat = "identity", width = 0.7,position="dodge") +
geom_errorbar(aes(ymin = lower_ci, ymax = upper_ci), position=position_dodge(.9),
width = 0.2) +
geom_text(aes(label = paste0(100*round(estimate,2), "%")),
vjust = 1.6,
color = "white", size = 3, position=position_dodge(.9)) +
xlab("cheese type") +
ylab("proportion of people choosing this type")
这个答案使用与@Abdessabour Mtk 相同的直觉,只是 purrr::map
和一些重构:
make_eff_df <- function(gender, fit) {
Effect("age", fit, xlevels = list(age = c(45, 90)),
given.values = c(age = 50, weight = 75, gender = gender)) %>%
as_tibble() %>%
mutate(gender = gender) %>%
select(gender, matches("[a-z\.]?prob")) %>%
slice(1)
}
map_dfr(0:1, make_eff_df, fit) %>%
pivot_longer(-gender, names_to = c(".value", "response"),
names_pattern = "(.+)\.(.+$)") %>%
rename(lower_ci = "L.prob", upper_ci = "U.prob", estimate = "prob") %>%
mutate(across(1:2, as.factor)) %>%
ggplot(aes(x = reorder(response, -estimate), y = estimate, fill = gender)) +
geom_bar(stat = "identity", width = 0.7, position = position_dodge(.9)) +
geom_errorbar(aes(ymin = lower_ci, ymax = upper_ci),
position = position_dodge(.9),
width = 0.2) +
geom_text(aes(label = scales::percent(estimate, accuracy = 1)),
vjust = 1.6, color = "white", size = 3, position=position_dodge(.9)) +
labs(x = "cheese type",
y = "proportion of people choosing this type")