R 中面板数据在多列上的线性插值

Question

我有一个数据集，其中包含按国家/地区划分的人口数量，并按性别和年龄段分类。数据以 5 年为增量记录。我想对数据进行线性插值以获得年度数据。

我的数据是这样的：

我想出了如何对一列进行插值。这是我使用的代码：

n <- 1

j <- c()

while(n < nrow(pop)){
  a <- as.data.frame(approx(x = pop$`5year`[n:(n+4)], y = pop$`Male_0-4`[n:(n+4)], xout = 2000:2020))
  j <- as.matrix(c(j,a$y))
  n <- n+5
}

此处，j 生成一个长向量，表示新的插值数据集中的相关列。因此，我想为每个年龄性别组合（即原始数据集中的列）生成一个这样的向量并将它们绑定在一起。

但是，我无法弄清楚如何一次对我的所有专栏执行此操作。我对函数、循环或应用家庭成员的尝试都没有奏效。感谢您提供的任何信息。

Answer 1

我会编写一个函数来在国家/地区组内插入任意变量，然后 map() 它覆盖所有变量并将它们重新组合在一起。

library(tidyverse)
data("population")

# create some data to interpolate
population_5 <- population %>% 
  filter(year %% 5 == 0) %>% 
  mutate(female_pop = population / 2,
         male_pop = population / 2)

interpolate_func <- function(variable, data) {
  data %>% 
    group_by(country) %>% 
    # can't interpolate if only one year
    filter(n() >= 2) %>% 
    group_modify(~as_tibble(approx(.x$year, .x[[variable]], 
                                   xout = min(.x$year):max(.x$year)))) %>% 
    set_names("country", "year", paste0(variable, "_interpolated")) %>% 
    ungroup()
}

vars_to_interpolate <- names(select(population_5, -country, -year))

map(vars_to_interpolate, interpolate_func, 
    data = population_5) %>% 
  reduce(full_join, by = c("country", "year"))

#> # A tibble: 3,395 × 5
#>    country      year population_interpolated female_pop_interp… male_pop_interp…
#>    <chr>       <int>                   <dbl>              <dbl>            <dbl>
#>  1 Afghanistan  1995               17586073            8793036.         8793036.
#>  2 Afghanistan  1996               18187930.           9093965.         9093965.
#>  3 Afghanistan  1997               18789788.           9394894.         9394894.
#>  4 Afghanistan  1998               19391645.           9695823.         9695823.
#>  5 Afghanistan  1999               19993503.           9996751.         9996751.
#>  6 Afghanistan  2000               20595360           10297680         10297680 
#>  7 Afghanistan  2001               21448459           10724230.        10724230.
#>  8 Afghanistan  2002               22301558           11150779         11150779 
#>  9 Afghanistan  2003               23154657           11577328.        11577328.
#> 10 Afghanistan  2004               24007756           12003878         12003878 
#> # … with 3,385 more rows

^{由 reprex package (v2.0.1)}

创建于 2022-06-01

R 中面板数据在多列上的线性插值

Linear interpolation of Panel Data in R over multiple columns

interpolation

r

panel-data