是否有 R 函数可以通过在缺少某些年份时按国家/地区分组来帮助将变量滞后一年?

Is there an R function that can help lag a variable by one year by grouping in country when some years are missing?

我在论坛中进行了搜索,但没有找到我问题的确切答案。我有一个来自世界银行的数据集

library(wbstats)
Gini <- wb(indicator = c("SI.POV.GINI"),
                     startdate = 2005, enddate = 2020)
Gini <- Gini[,c("iso3c", "date", "value")]
names(Gini)
names(Gini)<-c("iso3c", "date", "Gini")
#Change date to numeric
class(Gini$date)
Gini$date<-as.numeric(Gini$date)

#Tibble:
# A tibble: 1,012 x 3
   iso3c  date  Gini
   <chr> <dbl> <dbl>
 1 ALB    2017  33.2
 2 ALB    2016  33.7
 3 ALB    2015  32.9
 4 ALB    2014  34.6
 5 ALB    2012  29  
 6 ALB    2008  30  
 7 ALB    2005  30.6
 8 DZA    2011  27.6
 9 AGO    2018  51.3
10 AGO    2008  42.7
# … with 1,002 more rows

那我试着把这个估计滞后一年

#Lag Gini
lg <- function(x)c(NA, x[1:(length(x)-1)])

Lagged.Gini<-ddply(Gini, ~ iso3c, transform, Gini.lag.1 = lg(Gini))

tibble(Lagged.Gini)

# A tibble: 1,032 x 4
   iso3c  date  Gini Gini.lag.1
   <chr> <dbl> <dbl>      <dbl>
 1 AGO    2018  51.3       NA  
 2 AGO    2008  42.7       51.3
 3 ALB    2017  33.2       NA  
 4 ALB    2016  33.7       33.2
 5 ALB    2015  32.9       33.7
 6 ALB    2014  34.6       32.9
 7 ALB    2012  29         34.6
 8 ALB    2008  30         29  
 9 ALB    2005  30.6       30  
10 ARE    2014  32.5       NA  

不幸的是,我的问题是,当缺少年份时,滞后无法识别缺少的年份,只会将最近的年份作为滞后。例如:国家“ALB”的基尼系数估计在 2012 年没有滞后一年,而是滞后到下一年,即 2008 年。

我希望最终数据看起来一样,但我在下面进行了编辑——理想情况下能够滞后多年:

# A tibble: 1,032 x 4

   iso3c  date  Gini Gini.lag.1
   <chr> <dbl> <dbl>      <dbl>
 1 AGO    2018  51.3       NA  
   AGO   2017   NA        51.3
 2 AGO    2008  42.7       NA
   AGO    2007  NA        42.7
 3 ALB    2017  33.2       NA  
 4 ALB    2016  33.7       33.2
 5 ALB    2015  32.9       33.7
 6 ALB    2014  34.6       32.9
   ALB    2013   NA         29
 7 ALB    2012  29         NA
 8 ALB    2008  30         29  
 9 ALB    2005  30.6       30  
10 ARE    2014  32.5       NA  

您可以创建原件的副本 table,但日期要减去一年。然后只需在 iso3cdate 列上将两者连接在一起即可获得所需的最终结果。

像这样

Gini_lagged <- data.frame(
  iso3c = Gini$iso3c, 
  date = Gini$date-1, 
  Gini.lag.1 = Gini$Gini)
merge(Gini,Gini_lagged,all=TRUE)

使用 dplyr 和 tidyr 形成 tidyverse,您可以按行进行变异以查找与当前行中的年份减 1 相匹配的年份。

library(tidyverse)

Gini %>%
     rowwise() %>%
     mutate(Gini.lag.1 = list(Gini$Gini[date-1 == Gini$date])) %>%
     unnest(c(Gini.lag.1), keep_empty = T)

pseudospin 的答案非常适合 base R。由于您使用的是 tibbles,这里有一个具有相同效果的 tidyverse 版本:

Gini <- readr::read_table("
iso3c  date  Gini
ALB    2017  33.2
ALB    2016  33.7
ALB    2015  32.9
ALB    2014  34.6
ALB    2012  29  
ALB    2008  30  
ALB    2005  30.6
DZA    2011  27.6
AGO    2018  51.3
AGO    2008  42.7")

library(dplyr)
Gini %>%
  transmute(iso3c, date = date - 1, Gini.lag.1 = Gini) %>%
  full_join(Gini, ., by = c("iso3c", "date")) %>%
  arrange(iso3c, desc(date))
# # A tibble: 17 x 4
#    iso3c  date  Gini Gini.lag.1
#    <chr> <dbl> <dbl>      <dbl>
#  1 AGO    2018  51.3       NA  
#  2 AGO    2017  NA         51.3
#  3 AGO    2008  42.7       NA  
#  4 AGO    2007  NA         42.7
#  5 ALB    2017  33.2       NA  
#  6 ALB    2016  33.7       33.2
#  7 ALB    2015  32.9       33.7
#  8 ALB    2014  34.6       32.9
#  9 ALB    2013  NA         34.6
# 10 ALB    2012  29         NA  
# 11 ALB    2011  NA         29  
# 12 ALB    2008  30         NA  
# 13 ALB    2007  NA         30  
# 14 ALB    2005  30.6       NA  
# 15 ALB    2004  NA         30.6
# 16 DZA    2011  27.6       NA  
# 17 DZA    2010  NA         27.6

如果您需要这样做 n 次(每次延迟一次),您可以通过编程方式将其扩展为:

Ginilags <- lapply(1:3, function(lg) {
  z <- transmute(Gini, iso3c, date = date - lg, Gini)
  names(z)[3] <- paste0("Gini.lag.", lg)
  z
})
Reduce(function(a,b) full_join(a, b, by = c("iso3c", "date")),
       c(list(Gini), Ginilags)) %>%
  arrange(iso3c, desc(date))
# # A tibble: 28 x 6
#    iso3c  date  Gini Gini.lag.1 Gini.lag.2 Gini.lag.3
#    <chr> <dbl> <dbl>      <dbl>      <dbl>      <dbl>
#  1 AGO    2018  51.3       NA         NA         NA  
#  2 AGO    2017  NA         51.3       NA         NA  
#  3 AGO    2016  NA         NA         51.3       NA  
#  4 AGO    2015  NA         NA         NA         51.3
#  5 AGO    2008  42.7       NA         NA         NA  
#  6 AGO    2007  NA         42.7       NA         NA  
#  7 AGO    2006  NA         NA         42.7       NA  
#  8 AGO    2005  NA         NA         NA         42.7
#  9 ALB    2017  33.2       NA         NA         NA  
# 10 ALB    2016  33.7       33.2       NA         NA  
# # ... with 18 more rows