面板数据 (R) 中变量 t 相对于 t-1 的数学运算
Mathematical operation of a variable at t with respect to t-1 in panel data (R)
我有一个平衡的面板数据,其中每年记录 ID (cnpjcei),显示给定公司的雇员总数。我的目标是考虑数据库中所有年份 (t) 中的员工和 (t-1) 中的员工之间的差异(以防万一,empreg(t) - empreg(t-1))
# A tibble: 386,763 x 3
ano cnpjcei empreg
<dbl> <chr> <dbl>
1 2006 1000786001505 10
2 2007 1000786001505 12
3 2008 1000786001505 16
4 2009 1000786001505 19
5 2010 1000786001505 7
6 2011 1000786001505 7
7 2012 1000786001505 7
8 2013 1000786001505 7
9 2014 1000786001505 8
10 2015 1000786001505 9
# ... with 386,753 more rows
像这样:
# A tibble: 386,763 x 4
ano cnpjcei empreg variation_empreg
<dbl> <chr> <dbl>
1 2006 1000786001505 10
2 2007 1000786001505 12 2
3 2008 1000786001505 16 4
4 2009 1000786001505 19 3
5 2010 1000786001505 7 -12
6 2011 1000786001505 7 0
7 2012 1000786001505 7 0
8 2013 1000786001505 7 0
9 2014 1000786001505 8 1
10 2015 1000786001505 9 1
# ... with 386,753 more rows
有人有什么想法吗?谢谢:)
您可以使用 diff
:
library(dplyr)
df %>% mutate(variation_empreg = c(NA, diff(empreg)))
#> ano cnpjcei empreg variation_empreg
#> 1 2006 1000786001505 10 NA
#> 2 2007 1000786001505 12 2
#> 3 2008 1000786001505 16 4
#> 4 2009 1000786001505 19 3
#> 5 2010 1000786001505 7 -12
#> 6 2011 1000786001505 7 0
#> 7 2012 1000786001505 7 0
#> 8 2013 1000786001505 7 0
#> 9 2014 1000786001505 8 1
#> 10 2015 1000786001505 9 1
数据
df <- structure(list(ano = 2006:2015, cnpjcei = c("1000786001505",
"1000786001505", "1000786001505", "1000786001505", "1000786001505",
"1000786001505", "1000786001505", "1000786001505", "1000786001505",
"1000786001505"), empreg = c(10L, 12L, 16L, 19L, 7L, 7L, 7L,
7L, 8L, 9L)), row.names = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10"), class = "data.frame")
我有一个平衡的面板数据,其中每年记录 ID (cnpjcei),显示给定公司的雇员总数。我的目标是考虑数据库中所有年份 (t) 中的员工和 (t-1) 中的员工之间的差异(以防万一,empreg(t) - empreg(t-1))
# A tibble: 386,763 x 3
ano cnpjcei empreg
<dbl> <chr> <dbl>
1 2006 1000786001505 10
2 2007 1000786001505 12
3 2008 1000786001505 16
4 2009 1000786001505 19
5 2010 1000786001505 7
6 2011 1000786001505 7
7 2012 1000786001505 7
8 2013 1000786001505 7
9 2014 1000786001505 8
10 2015 1000786001505 9
# ... with 386,753 more rows
像这样:
# A tibble: 386,763 x 4
ano cnpjcei empreg variation_empreg
<dbl> <chr> <dbl>
1 2006 1000786001505 10
2 2007 1000786001505 12 2
3 2008 1000786001505 16 4
4 2009 1000786001505 19 3
5 2010 1000786001505 7 -12
6 2011 1000786001505 7 0
7 2012 1000786001505 7 0
8 2013 1000786001505 7 0
9 2014 1000786001505 8 1
10 2015 1000786001505 9 1
# ... with 386,753 more rows
有人有什么想法吗?谢谢:)
您可以使用 diff
:
library(dplyr)
df %>% mutate(variation_empreg = c(NA, diff(empreg)))
#> ano cnpjcei empreg variation_empreg
#> 1 2006 1000786001505 10 NA
#> 2 2007 1000786001505 12 2
#> 3 2008 1000786001505 16 4
#> 4 2009 1000786001505 19 3
#> 5 2010 1000786001505 7 -12
#> 6 2011 1000786001505 7 0
#> 7 2012 1000786001505 7 0
#> 8 2013 1000786001505 7 0
#> 9 2014 1000786001505 8 1
#> 10 2015 1000786001505 9 1
数据
df <- structure(list(ano = 2006:2015, cnpjcei = c("1000786001505",
"1000786001505", "1000786001505", "1000786001505", "1000786001505",
"1000786001505", "1000786001505", "1000786001505", "1000786001505",
"1000786001505"), empreg = c(10L, 12L, 16L, 19L, 7L, 7L, 7L,
7L, 8L, 9L)), row.names = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10"), class = "data.frame")