R 中变量的编码更改

Coding Changes in Variables in R

我正在处理如下所示的数据:

   ID Year Variable_of_Interest
1   a 2000                   0
2   a 2001                   0
3   a 2002                   0
4   a 2003                   0
5   a 2004                   0
6   a 2005                   1
7   a 2006                   1
8   a 2007                   1
9   a 2008                   1
10  a 2009                   1
11  b 2000                   0
12  b 2001                   0
13  b 2002                   0
14  b 2003                   1
15  b 2004                   1
16  b 2005                   1
17  b 2006                   1
18  b 2007                   1
19  b 2008                   1
20  b 2009                   1
21  c 2000                   0
22  c 2001                   0
23  c 2002                   0
24  c 2003                   0
25  c 2004                   0
26  c 2005                   0
27  c 2006                   1
28  c 2007                   1
29  c 2008                   1
30  c 2009                   1
31  d 2000                   0
32  d 2001                   0
33  d 2002                   1
34  d 2003                   1
35  d 2004                   1
36  d 2005                   1
37  d 2006                   0
38  d 2007                   0
39  d 2008                   0
40  d 2009                   0

分析单位为ID。这些 ID 在数据中每年都会重复。 variable of interest 列表示对 ID 的更改,其中有些年份是 0,有些年份是 1

我想在 Variable_of_Interest 中创建一个额外的列来编码变化前后的变化(定义为从 0 到 1),同时忽略从(1 到 0)的变化(如 ID 等于 "d" 时所见)。

任何可以帮助我实现此解决方案的代码都将不胜感激!

Perferability 我希望数据看起来像这样:

   ID Year Variable_of_Interest Solution
1   a 2000                   0       -5
2   a 2001                   0       -4
3   a 2002                   0       -3
4   a 2003                   0       -2
5   a 2004                   0       -1
6   a 2005                   1        0
7   a 2006                   1        1
8   a 2007                   1        2
9   a 2008                   1        3
10  a 2009                   1        4
11  b 2000                   0       -3
12  b 2001                   0       -2
13  b 2002                   0       -1
14  b 2003                   1        0
15  b 2004                   1        1
16  b 2005                   1        2
17  b 2006                   1        3
18  b 2007                   1        4
19  b 2008                   1        5
20  b 2009                   1        6
21  c 2000                   0       -6
22  c 2001                   0       -5
23  c 2002                   0       -4
24  c 2003                   0       -3
25  c 2004                   0       -2
26  c 2005                   0       -1
27  c 2006                   1        0
28  c 2007                   1        1
29  c 2008                   1        2
30  c 2009                   1        3
31  d 2000                   0       -2
32  d 2001                   0       -1
33  d 2002                   1        0
34  d 2003                   1        1
35  d 2004                   1        2
36  d 2005                   1        3
37  d 2006                   0       NA
38  d 2007                   0       NA
39  d 2008                   0       NA
40  d 2009                   0       NA

复制代码如下:

ID <- c(rep("a",10), rep("b", 10), rep("c", 10), rep("d", 10)); length(ID)
Year <- rep(seq(2000,2009, 1), 4)
Variable_of_Interest <- c(rep(0,5), rep(1, 5), 
                         rep(0,3), rep(1, 7), 
                         rep(0,6), rep(1, 4),
                         rep(0,2), rep(1, 4), rep(0,4))


data.frame(ID, Year, Variable_of_Interest)

感谢您的帮助!

我们可以创建一个函数:

library(dplyr)

get_sequence <- function(x) {
  inds <- which(x == 1 & lag(x) == 0)[1]
  vals <- seq_along(x) - inds
  inds <- which(x == 0 & lag(x) == 1)[1]
  if(!is.na(inds))  vals[inds:length(x)] <- NA
  return(vals)
}

并将其应用于每个 ID :

df %>% group_by(ID) %>% mutate(Solution = get_sequence(Variable_of_Interest)) 

#   ID Year Variable_of_Interest Solution
#1   a 2000                    0       -5
#2   a 2001                    0       -4
#3   a 2002                    0       -3
#4   a 2003                    0       -2
#5   a 2004                    0       -1
#6   a 2005                    1        0
#7   a 2006                    1        1
#8   a 2007                    1        2
#9   a 2008                    1        3
#10  a 2009                    1        4
#11  b 2000                    0       -3
#...
#...
#33  d 2002                    1        0
#34  d 2003                    1        1
#35  d 2004                    1        2
#36  d 2005                    1        3
#37  d 2006                    0       NA
#38  d 2007                    0       NA
#39  d 2008                    0       NA
#40  d 2009                    0       NA