R嵌套模型:创建模型公式列
R nested models: create column of model formulas
如何从模型的嵌套数据框创建一列公式(例如 y ~ x
或 y ~ log(x)
或...)?
在下面的尝试中,模型列包含具有最大 R 平方值的模型。创建一列模型公式的目的是确定每一行中使用了哪个模型。
library(tidyverse)
library(broom)
df <- gapminder::gapminder %>%
select(country, x = year, y = lifeExp) %>%
group_by(country) %>%
nest()
rsq_f <- function(model){summary(model)$r.squared}
best_model <- function(df){
models <- list(
lm(formula = y ~ x, data = df),
lm(formula = y ~ log(x), data = df),
lm(formula = log(y) ~ x, data = df),
lm(formula = log(y) ~ log(x), data = df)
)
R_squared <- map_dbl(models, rsq_f)
best_model_num <- which.max(R_squared)
models[best_model_num][[1]]
}
models <- df %>%
mutate(
model = map(data, best_model),
rsq = map(model, broom::glance) %>% map_dbl("r.squared"),
fun_call = map(model, formula)
)
输出是
> models
# A tibble: 142 x 5
country data model rsq fun_call
<fct> <list> <list> <dbl> <list>
1 Afghanistan <tibble [12 x 2]> <S3: lm> 0.949 <S3: formula>
2 Albania <tibble [12 x 2]> <S3: lm> 0.912 <S3: formula>
3 Algeria <tibble [12 x 2]> <S3: lm> 0.986 <S3: formula>
4 Angola <tibble [12 x 2]> <S3: lm> 0.890 <S3: formula>
5 Argentina <tibble [12 x 2]> <S3: lm> 0.996 <S3: formula>
6 Australia <tibble [12 x 2]> <S3: lm> 0.983 <S3: formula>
7 Austria <tibble [12 x 2]> <S3: lm> 0.994 <S3: formula>
8 Bahrain <tibble [12 x 2]> <S3: lm> 0.968 <S3: formula>
9 Bangladesh <tibble [12 x 2]> <S3: lm> 0.997 <S3: formula>
10 Belgium <tibble [12 x 2]> <S3: lm> 0.995 <S3: formula>
# ... with 132 more rows
而不是 <S3: formula>
我想实际查看模型使用的公式。
为了让自己更清楚,我将 post 作为示例的答案,如果我理解正确的话,您会寻求包含公式的列,例如字符串 "y ~ x"
.
假设我们有一个简单的 lm
:
x <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
y <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
my_lm <- lm(y~ x)
通过查看术语,您得到了公式,只是排列不正确:
as.character(my_lm[["terms"]])
# [1] "~" "y" "x"
您只需重新排列前两项:
paste(as.character(my_lm$terms)[2],as.character(my_lm$terms)[1], as.character(my_lm$terms)[-c(1:2)])
# [1] "y ~ x"
这可以用 mutate
分配给列。
根据 RLave 的评论,答案只是添加 as.character()
:
models <- df %>%
mutate(
model = map(data, best_model),
rsq = map(model, broom::glance) %>% map_dbl("r.squared"),
fun_call = map(model, formula) %>% as.character()
)
给出:
# A tibble: 142 x 5
country data model rsq fun_call
<fct> <list> <list> <dbl> <chr>
1 Afghanistan <tibble [12 x 2]> <S3: lm> 0.949 y ~ log(x)
2 Albania <tibble [12 x 2]> <S3: lm> 0.912 y ~ log(x)
3 Algeria <tibble [12 x 2]> <S3: lm> 0.986 y ~ log(x)
4 Angola <tibble [12 x 2]> <S3: lm> 0.890 y ~ log(x)
5 Argentina <tibble [12 x 2]> <S3: lm> 0.996 y ~ x
6 Australia <tibble [12 x 2]> <S3: lm> 0.983 log(y) ~ x
7 Austria <tibble [12 x 2]> <S3: lm> 0.994 log(y) ~ x
8 Bahrain <tibble [12 x 2]> <S3: lm> 0.968 y ~ log(x)
9 Bangladesh <tibble [12 x 2]> <S3: lm> 0.997 log(y) ~ x
10 Belgium <tibble [12 x 2]> <S3: lm> 0.995 log(y) ~ x
# ... with 132 more rows
如何从模型的嵌套数据框创建一列公式(例如 y ~ x
或 y ~ log(x)
或...)?
在下面的尝试中,模型列包含具有最大 R 平方值的模型。创建一列模型公式的目的是确定每一行中使用了哪个模型。
library(tidyverse)
library(broom)
df <- gapminder::gapminder %>%
select(country, x = year, y = lifeExp) %>%
group_by(country) %>%
nest()
rsq_f <- function(model){summary(model)$r.squared}
best_model <- function(df){
models <- list(
lm(formula = y ~ x, data = df),
lm(formula = y ~ log(x), data = df),
lm(formula = log(y) ~ x, data = df),
lm(formula = log(y) ~ log(x), data = df)
)
R_squared <- map_dbl(models, rsq_f)
best_model_num <- which.max(R_squared)
models[best_model_num][[1]]
}
models <- df %>%
mutate(
model = map(data, best_model),
rsq = map(model, broom::glance) %>% map_dbl("r.squared"),
fun_call = map(model, formula)
)
输出是
> models
# A tibble: 142 x 5
country data model rsq fun_call
<fct> <list> <list> <dbl> <list>
1 Afghanistan <tibble [12 x 2]> <S3: lm> 0.949 <S3: formula>
2 Albania <tibble [12 x 2]> <S3: lm> 0.912 <S3: formula>
3 Algeria <tibble [12 x 2]> <S3: lm> 0.986 <S3: formula>
4 Angola <tibble [12 x 2]> <S3: lm> 0.890 <S3: formula>
5 Argentina <tibble [12 x 2]> <S3: lm> 0.996 <S3: formula>
6 Australia <tibble [12 x 2]> <S3: lm> 0.983 <S3: formula>
7 Austria <tibble [12 x 2]> <S3: lm> 0.994 <S3: formula>
8 Bahrain <tibble [12 x 2]> <S3: lm> 0.968 <S3: formula>
9 Bangladesh <tibble [12 x 2]> <S3: lm> 0.997 <S3: formula>
10 Belgium <tibble [12 x 2]> <S3: lm> 0.995 <S3: formula>
# ... with 132 more rows
而不是 <S3: formula>
我想实际查看模型使用的公式。
为了让自己更清楚,我将 post 作为示例的答案,如果我理解正确的话,您会寻求包含公式的列,例如字符串 "y ~ x"
.
假设我们有一个简单的 lm
:
x <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
y <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
my_lm <- lm(y~ x)
通过查看术语,您得到了公式,只是排列不正确:
as.character(my_lm[["terms"]])
# [1] "~" "y" "x"
您只需重新排列前两项:
paste(as.character(my_lm$terms)[2],as.character(my_lm$terms)[1], as.character(my_lm$terms)[-c(1:2)])
# [1] "y ~ x"
这可以用 mutate
分配给列。
根据 RLave 的评论,答案只是添加 as.character()
:
models <- df %>%
mutate(
model = map(data, best_model),
rsq = map(model, broom::glance) %>% map_dbl("r.squared"),
fun_call = map(model, formula) %>% as.character()
)
给出:
# A tibble: 142 x 5
country data model rsq fun_call
<fct> <list> <list> <dbl> <chr>
1 Afghanistan <tibble [12 x 2]> <S3: lm> 0.949 y ~ log(x)
2 Albania <tibble [12 x 2]> <S3: lm> 0.912 y ~ log(x)
3 Algeria <tibble [12 x 2]> <S3: lm> 0.986 y ~ log(x)
4 Angola <tibble [12 x 2]> <S3: lm> 0.890 y ~ log(x)
5 Argentina <tibble [12 x 2]> <S3: lm> 0.996 y ~ x
6 Australia <tibble [12 x 2]> <S3: lm> 0.983 log(y) ~ x
7 Austria <tibble [12 x 2]> <S3: lm> 0.994 log(y) ~ x
8 Bahrain <tibble [12 x 2]> <S3: lm> 0.968 y ~ log(x)
9 Bangladesh <tibble [12 x 2]> <S3: lm> 0.997 log(y) ~ x
10 Belgium <tibble [12 x 2]> <S3: lm> 0.995 log(y) ~ x
# ... with 132 more rows