跨年扩展 data.table 的行

expand rows with data.table across years

我正在寻找 data.table 解决方案来扩展我的数据集以包含缺失的年份,并将缺失年份的值分配为零。 在以下示例中:

> df <- data.table(firm = rep(c("A","B"),each=4), 
                   year = rep(c(2005,2007,2008,2011),2), var="var")
> df
   firm year var
1:    A 2005 var
2:    A 2007 var
3:    A 2008 var
4:    A 2011 var
5:    B 2005 var
6:    B 2007 var
7:    B 2008 var
8:    B 2011 var

所需的输出(解决方案 1)是:

> df
    firm year var
 1:    A 2005 var
 2:    A 2006   0
 3:    A 2007 var
 4:    A 2008 var
 5:    A 2009   0
 6:    A 2010   0
 7:    A 2011 var
 8:    B 2005 var
 9:    B 2006   0
10:    B 2007 var
11:    B 2008 var
12:    B 2009   0
13:    B 2010   0
14:    B 2011 var

此外,有什么方法可以添加不在我的初始数据集中的前几年或后几年,再次将零分配给其他变量。例如,我的初始数据集中的年份范围是 2005-2011,我想将其扩展到 2003-2012,输出如下(解决方案 2):

> df
    firm year var
 1:    A 2003   0
 2:    A 2004   0
 3:    A 2005 var
 4:    A 2006   0
 5:    A 2007 var
 6:    A 2008 var
 7:    A 2009   0
 8:    A 2010   0
 9:    A 2011 var
10:    B 2012   0
11:    B 2003   0
12:    B 2004   0
13:    B 2005 var
14:    B 2006   0
15:    B 2007 var
16:    B 2008 var
17:    B 2009   0
18:    B 2010   0
19:    A 2011 var
20:    A 2012   0

您可以使用 expand.grid 生成所有可能的组合:

library(data.table)

all <- data.table(expand.grid(year=2003:2012,firm =unique(df$firm)))

df[all,.(firm,year,var=fifelse(is.na(var),"0",var)),on=.(year=year,firm=firm)]

    firm year var
 1:    A 2003   0
 2:    A 2004   0
 3:    A 2005 var
 4:    A 2006   0
 5:    A 2007 var
 6:    A 2008 var
 7:    A 2009   0
 8:    A 2010   0
 9:    A 2011 var
10:    A 2012   0
11:    B 2003   0
12:    B 2004   0
13:    B 2005 var
14:    B 2006   0
15:    B 2007 var
16:    B 2008 var
17:    B 2009   0
18:    B 2010   0
19:    B 2011 var
20:    B 2012   0

我们可以使用crossing

library(dplyr)
library(tidyr)
crossing(year = 2003:2012, firm = unique(df$firm)) %>%
     left_join(df, by = c('year', 'firm')) %>%
     mutate(var = ifelse(is.na(var), "0", var))

或者另一种选择是group_by/complete

df %>% 
  group_by(firm) %>%
  complete(year = 2003:2012, fill = list(var = 0)) %>%
  ungroup 

如果 tidyverse 是一个选项,这可以在 complete 内完成,使用 nestingfill 作为参数

df %>% complete(year = 2003:2012, nesting(firm), fill = list(var = 0))

# A tibble: 20 x 3
    year firm  var  
   <dbl> <chr> <chr>
 1  2003 A     0    
 2  2003 B     0    
 3  2004 A     0    
 4  2004 B     0    
 5  2005 A     var  
 6  2005 B     var  
 7  2006 A     0    
 8  2006 B     0    
 9  2007 A     var  
10  2007 B     var  
11  2008 A     var  
12  2008 B     var  
13  2009 A     0    
14  2009 B     0    
15  2010 A     0    
16  2010 B     0    
17  2011 A     var  
18  2011 B     var  
19  2012 A     0    
20  2012 B     0