向现有模型提供新数据并使用 broom::augment 添加预测
Feeding new data to existing model and using broom::augment to add predictions
我正在使用 tidyverse
、broom
和 purrr
将模型按组拟合到某些数据。然后,我尝试使用此模型再次按组预测一些新数据。 broom
的 'augment' 函数不仅很好地添加了预测,还添加了其他值,如标准错误等。但是,我无法让 'augment' 函数使用新数据的旧数据。结果,我的两组预测完全一致。问题是 - 如何让 'augment' 使用新数据而不是旧数据(用于拟合模型)?
这是一个可重现的例子:
library(tidyverse)
library(broom)
library(purrr)
# nest the iris dataset by Species and fit a linear model
iris.nest <- nest(iris, data = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)) %>%
mutate(model = map(data, function(df) lm(Sepal.Width ~ Sepal.Length, data=df)))
# create a new dataset where the Sepal.Length is 5x as big
newdata <- iris %>%
mutate(Sepal.Length = Sepal.Length*5) %>%
nest(data = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)) %>%
rename("newdata"="data")
# join these two nested datasets together
iris.nest.new <- left_join(iris.nest, newdata)
# now form two new columns of predictions -- one using the "old" data that the model was
# initially fit on, and the second using the new data where the Sepal.Length has been increased
iris.nest.new <- iris.nest.new %>%
mutate(preds = map(model, broom::augment),
preds.new = map2(model, newdata, broom::augment)) # THIS LINE DOESN'T WORK ****
# unnest the predictions on the "old" data
preds <-select(iris.nest.new, preds) %>%
unnest(cols = c(preds))
# rename the columns prior to merging
names(preds)[3:9] <- paste0("old", names(preds)[3:9])
# now unnest the predictions on the "new" data
preds.new <-select(iris.nest.new, preds.new) %>%
unnest(cols = c(preds.new))
#... and also rename columns prior to merging
names(preds.new)[3:9] <- paste0("new", names(preds.new)[3:9])
# merge the two sets of predictions and compare
compare <- bind_cols(preds, preds.new)
# compare
select(compare, old.fitted, new.fitted) %>% View(.) # EXACTLY THE SAME!!!!
调用broom::augment
时,注意newdata=
参数是第三个参数。当您使用 purr::map2
时,您迭代的值默认在前两个参数中传递。将传入的列表命名为什么并不重要。您需要将新数据显式放置在 newdata=
参数中。
iris.nest.new <- iris.nest.new %>%
mutate(preds = map(model, broom::augment),
preds.new = map2(model, newdata, ~broom::augment(.x, newdata=.y)))
可以看出区别运行这两个命令
broom::augment(iris.nest.new$model[[1]], iris.nest.new$newdata[[1]])
broom::augment(iris.nest.new$model[[1]], newdata=iris.nest.new$newdata[[1]])
我正在使用 tidyverse
、broom
和 purrr
将模型按组拟合到某些数据。然后,我尝试使用此模型再次按组预测一些新数据。 broom
的 'augment' 函数不仅很好地添加了预测,还添加了其他值,如标准错误等。但是,我无法让 'augment' 函数使用新数据的旧数据。结果,我的两组预测完全一致。问题是 - 如何让 'augment' 使用新数据而不是旧数据(用于拟合模型)?
这是一个可重现的例子:
library(tidyverse)
library(broom)
library(purrr)
# nest the iris dataset by Species and fit a linear model
iris.nest <- nest(iris, data = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)) %>%
mutate(model = map(data, function(df) lm(Sepal.Width ~ Sepal.Length, data=df)))
# create a new dataset where the Sepal.Length is 5x as big
newdata <- iris %>%
mutate(Sepal.Length = Sepal.Length*5) %>%
nest(data = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)) %>%
rename("newdata"="data")
# join these two nested datasets together
iris.nest.new <- left_join(iris.nest, newdata)
# now form two new columns of predictions -- one using the "old" data that the model was
# initially fit on, and the second using the new data where the Sepal.Length has been increased
iris.nest.new <- iris.nest.new %>%
mutate(preds = map(model, broom::augment),
preds.new = map2(model, newdata, broom::augment)) # THIS LINE DOESN'T WORK ****
# unnest the predictions on the "old" data
preds <-select(iris.nest.new, preds) %>%
unnest(cols = c(preds))
# rename the columns prior to merging
names(preds)[3:9] <- paste0("old", names(preds)[3:9])
# now unnest the predictions on the "new" data
preds.new <-select(iris.nest.new, preds.new) %>%
unnest(cols = c(preds.new))
#... and also rename columns prior to merging
names(preds.new)[3:9] <- paste0("new", names(preds.new)[3:9])
# merge the two sets of predictions and compare
compare <- bind_cols(preds, preds.new)
# compare
select(compare, old.fitted, new.fitted) %>% View(.) # EXACTLY THE SAME!!!!
调用broom::augment
时,注意newdata=
参数是第三个参数。当您使用 purr::map2
时,您迭代的值默认在前两个参数中传递。将传入的列表命名为什么并不重要。您需要将新数据显式放置在 newdata=
参数中。
iris.nest.new <- iris.nest.new %>%
mutate(preds = map(model, broom::augment),
preds.new = map2(model, newdata, ~broom::augment(.x, newdata=.y)))
可以看出区别运行这两个命令
broom::augment(iris.nest.new$model[[1]], iris.nest.new$newdata[[1]])
broom::augment(iris.nest.new$model[[1]], newdata=iris.nest.new$newdata[[1]])