如何应用 loess 函数并使用 r 中的 dplyr 按组进行预测?

How can apply a loess function and get predictions by groups using dplyr in r?

我有这个示例数据集:

data.1 <-read.csv(text = "
country,year,response
Austria,2010,34378
Austria,2011,38123
Austria,2012,37126
Austria,2013,42027
Austria,2014,43832
Austria,2015,56895
Austria,2016,49791
Austria,2017,64467
Austria,2018,67620
Austria,2019,69210
Croatia,2010,56456
Croatia,2011,58896
Croatia,2012,54109
Croatia,2013,47156
Croatia,2014,47104
Croatia,2015,88867
Croatia,2016,78614
Croatia,2017,85133
Croatia,2018,77090
Croatia,2019,78330
France,2010,50939
France,2011,41571
France,2012,37367
France,2013,42999
France,2014,75789
France,2015,122529
France,2016,136518
France,2017,141829
France,2018,153850
France,2019,163800
")

我想通过 country 调整 loess 函数,并在我提供的数据框中获取每年的预测值。 loess 平滑看起来像这样:

ggplot(data.1, aes(x=year, y=response, color=country)) +
  geom_point(size = 3, alpha=0.3) + 
  #geom_line(aes(x=year, y=area_harvested_ha/1000), size=0.5, alpha= 1) +
  geom_smooth(method = 'loess', span=0.75, na.rm = T, se=F, size = 2)

剧情:

这是我试图获得预测的代码:

data.1.with.pred <- data.1 %>% 
  group_by(country) %>% 
  arrange(country, year) %>% 
  mutate(pred.response = stats::predict(stats::loess(response ~ year, span = .75, data=.),
                         data.frame(year = seq(min(year), max(year), 1))))

我在数据框中得到预测,但按 country 分组不起作用。

剧情是这样的:

ggplot(data.1.with.pred, aes(x=year, y=pred.response, color=country)) +
  geom_point(aes(x=year, y=response), size = 3, alpha=0.3) + 
  #geom_line(aes(x=year, y=area_harvested_ha/1000), size=0.5, alpha= 1) +
  geom_smooth(method = 'loess', span=0.75, na.rm = T, se=F, size = 2)

我遇到的问题是 country 分组失败。我从这里得到了这个答案:

非常感谢您的建议。

如果您想获得每个国家/地区的黄土预测,您可能需要使用 nest()ed 数据框。这将使您可以设置一个包含特定国家/地区数据的数据框的列,然后 运行 loess()predict() 在这些单独的数据框上,然后 unnest() 带来结果返回标准格式。

这里有一些嵌套数据的代码,运行对每个国家/地区进行分析,然后将其拉回常规数据框:

library(tidyverse)

data.1.with.pred <- data.1 %>% 
  group_by(country) %>% 
  arrange(country, year) %>% 
  nest() %>%
  mutate(pred.response = purrr::map(data, function(x)stats::loess(response~year, span= 0.75, data = x) %>%
                             stats::predict(data.frame(year = seq(min(x$year), max(x$year), 1))))) %>%
  unnest(cols = c(data, pred.response))

data.1.with.pred %>%
  ggplot() +
  geom_point(aes(x = year, y = response, colour = country)) +
  geom_line(aes(x = year,y=pred.response, colour = country))

生成的数据框包含每个国家/地区的年度黄土预测,而不是所有国家/地区的总和,图表如下所示:

这是你想要做的吗?

使用 loess 函数为您的数据子集建立模型,如下所示:

#use a loess model on a subset of the data (France)
    model <- loess(formula = response ~ year,data = subset(data.1,country == "France"))

#plot
    ggplot() +
      geom_point(data = data.1,
                 mapping = aes(x=year, y=response, color=country),size = 3, alpha=0.3) + 
      geom_line(aes(model$x,model$fitted)) +
      geom_smooth(method = 'loess', span=0.75, na.rm = T, se=F, size = 2)

拟合值在 model$fitted

这里的问题是 group_by 不能很好地与 mutate/predict 函数一起使用。

在此解决方案中,我拆分数据帧,计算每个预测,然后组合并绘制:

#split by country
sdata <-split(data.1, data.1$country)
#calculate the predicted values for each country
data.1.with.pred <- lapply(sdata, function(df){
   df$pred.response  <-stats::predict(stats::loess(response ~ year, span = .75, data=df))
   df
})

#merge back into 1 dataframe
data.1.with.pred <-dplyr::bind_rows(data.1.with.pred )

#data.1.with.pred[order(data.1.with.pred$year),]

ggplot(data.1.with.pred, aes(x=year, y=pred.response, color=country)) +
   geom_point(aes(x=year, y=response), size = 3, alpha=0.3) + 
   #geom_line(aes(x=year, y=area_harvested_ha/1000), size=0.5, alpha= 1) +
   geom_smooth(method = 'loess', span=0.75, na.rm = T, se=F, size = 2)

类似于 Henry Holm 的回答:

library(purrr)

model <- data.1 %>% 
  split(f = .$country) %>% 
  map(~stats::loess(response ~ year, span = .75, data=.x))

为每个 country 创建一个模型。现在您可以通过

访问拟合值
model$Austria$fitted
#>  [1] 35195.78 36149.17 37988.25 40221.17 47372.73 51220.11 55611.14 61368.08 66159.05 70242.01
model$Croatia$fitted
#>  [1] 59333.25 53963.12 49872.81 45156.89 57061.66 76289.39 86357.84 84047.18 81245.77 76487.97
model$France$fitted
#>  [1]  53011.15  37627.29  35403.63  45360.31  78379.48 117055.05 137189.73 146822.95 155585.16 162336.60