跨年扩展 data.table 的行
expand rows with data.table across years
我正在寻找 data.table
解决方案来扩展我的数据集以包含缺失的年份,并将缺失年份的值分配为零。
在以下示例中:
> df <- data.table(firm = rep(c("A","B"),each=4),
year = rep(c(2005,2007,2008,2011),2), var="var")
> df
firm year var
1: A 2005 var
2: A 2007 var
3: A 2008 var
4: A 2011 var
5: B 2005 var
6: B 2007 var
7: B 2008 var
8: B 2011 var
所需的输出(解决方案 1)是:
> df
firm year var
1: A 2005 var
2: A 2006 0
3: A 2007 var
4: A 2008 var
5: A 2009 0
6: A 2010 0
7: A 2011 var
8: B 2005 var
9: B 2006 0
10: B 2007 var
11: B 2008 var
12: B 2009 0
13: B 2010 0
14: B 2011 var
此外,有什么方法可以添加不在我的初始数据集中的前几年或后几年,再次将零分配给其他变量。例如,我的初始数据集中的年份范围是 2005-2011,我想将其扩展到 2003-2012,输出如下(解决方案 2):
> df
firm year var
1: A 2003 0
2: A 2004 0
3: A 2005 var
4: A 2006 0
5: A 2007 var
6: A 2008 var
7: A 2009 0
8: A 2010 0
9: A 2011 var
10: B 2012 0
11: B 2003 0
12: B 2004 0
13: B 2005 var
14: B 2006 0
15: B 2007 var
16: B 2008 var
17: B 2009 0
18: B 2010 0
19: A 2011 var
20: A 2012 0
您可以使用 expand.grid
生成所有可能的组合:
library(data.table)
all <- data.table(expand.grid(year=2003:2012,firm =unique(df$firm)))
df[all,.(firm,year,var=fifelse(is.na(var),"0",var)),on=.(year=year,firm=firm)]
firm year var
1: A 2003 0
2: A 2004 0
3: A 2005 var
4: A 2006 0
5: A 2007 var
6: A 2008 var
7: A 2009 0
8: A 2010 0
9: A 2011 var
10: A 2012 0
11: B 2003 0
12: B 2004 0
13: B 2005 var
14: B 2006 0
15: B 2007 var
16: B 2008 var
17: B 2009 0
18: B 2010 0
19: B 2011 var
20: B 2012 0
我们可以使用crossing
library(dplyr)
library(tidyr)
crossing(year = 2003:2012, firm = unique(df$firm)) %>%
left_join(df, by = c('year', 'firm')) %>%
mutate(var = ifelse(is.na(var), "0", var))
或者另一种选择是group_by/complete
df %>%
group_by(firm) %>%
complete(year = 2003:2012, fill = list(var = 0)) %>%
ungroup
如果 tidyverse
是一个选项,这可以在 complete
内完成,使用 nesting
和 fill
作为参数
df %>% complete(year = 2003:2012, nesting(firm), fill = list(var = 0))
# A tibble: 20 x 3
year firm var
<dbl> <chr> <chr>
1 2003 A 0
2 2003 B 0
3 2004 A 0
4 2004 B 0
5 2005 A var
6 2005 B var
7 2006 A 0
8 2006 B 0
9 2007 A var
10 2007 B var
11 2008 A var
12 2008 B var
13 2009 A 0
14 2009 B 0
15 2010 A 0
16 2010 B 0
17 2011 A var
18 2011 B var
19 2012 A 0
20 2012 B 0
我正在寻找 data.table
解决方案来扩展我的数据集以包含缺失的年份,并将缺失年份的值分配为零。
在以下示例中:
> df <- data.table(firm = rep(c("A","B"),each=4),
year = rep(c(2005,2007,2008,2011),2), var="var")
> df
firm year var
1: A 2005 var
2: A 2007 var
3: A 2008 var
4: A 2011 var
5: B 2005 var
6: B 2007 var
7: B 2008 var
8: B 2011 var
所需的输出(解决方案 1)是:
> df
firm year var
1: A 2005 var
2: A 2006 0
3: A 2007 var
4: A 2008 var
5: A 2009 0
6: A 2010 0
7: A 2011 var
8: B 2005 var
9: B 2006 0
10: B 2007 var
11: B 2008 var
12: B 2009 0
13: B 2010 0
14: B 2011 var
此外,有什么方法可以添加不在我的初始数据集中的前几年或后几年,再次将零分配给其他变量。例如,我的初始数据集中的年份范围是 2005-2011,我想将其扩展到 2003-2012,输出如下(解决方案 2):
> df
firm year var
1: A 2003 0
2: A 2004 0
3: A 2005 var
4: A 2006 0
5: A 2007 var
6: A 2008 var
7: A 2009 0
8: A 2010 0
9: A 2011 var
10: B 2012 0
11: B 2003 0
12: B 2004 0
13: B 2005 var
14: B 2006 0
15: B 2007 var
16: B 2008 var
17: B 2009 0
18: B 2010 0
19: A 2011 var
20: A 2012 0
您可以使用 expand.grid
生成所有可能的组合:
library(data.table)
all <- data.table(expand.grid(year=2003:2012,firm =unique(df$firm)))
df[all,.(firm,year,var=fifelse(is.na(var),"0",var)),on=.(year=year,firm=firm)]
firm year var
1: A 2003 0
2: A 2004 0
3: A 2005 var
4: A 2006 0
5: A 2007 var
6: A 2008 var
7: A 2009 0
8: A 2010 0
9: A 2011 var
10: A 2012 0
11: B 2003 0
12: B 2004 0
13: B 2005 var
14: B 2006 0
15: B 2007 var
16: B 2008 var
17: B 2009 0
18: B 2010 0
19: B 2011 var
20: B 2012 0
我们可以使用crossing
library(dplyr)
library(tidyr)
crossing(year = 2003:2012, firm = unique(df$firm)) %>%
left_join(df, by = c('year', 'firm')) %>%
mutate(var = ifelse(is.na(var), "0", var))
或者另一种选择是group_by/complete
df %>%
group_by(firm) %>%
complete(year = 2003:2012, fill = list(var = 0)) %>%
ungroup
如果 tidyverse
是一个选项,这可以在 complete
内完成,使用 nesting
和 fill
作为参数
df %>% complete(year = 2003:2012, nesting(firm), fill = list(var = 0))
# A tibble: 20 x 3
year firm var
<dbl> <chr> <chr>
1 2003 A 0
2 2003 B 0
3 2004 A 0
4 2004 B 0
5 2005 A var
6 2005 B var
7 2006 A 0
8 2006 B 0
9 2007 A var
10 2007 B var
11 2008 A var
12 2008 B var
13 2009 A 0
14 2009 B 0
15 2010 A 0
16 2010 B 0
17 2011 A var
18 2011 B var
19 2012 A 0
20 2012 B 0