R 中变量的编码更改
Coding Changes in Variables in R
我正在处理如下所示的数据:
ID Year Variable_of_Interest
1 a 2000 0
2 a 2001 0
3 a 2002 0
4 a 2003 0
5 a 2004 0
6 a 2005 1
7 a 2006 1
8 a 2007 1
9 a 2008 1
10 a 2009 1
11 b 2000 0
12 b 2001 0
13 b 2002 0
14 b 2003 1
15 b 2004 1
16 b 2005 1
17 b 2006 1
18 b 2007 1
19 b 2008 1
20 b 2009 1
21 c 2000 0
22 c 2001 0
23 c 2002 0
24 c 2003 0
25 c 2004 0
26 c 2005 0
27 c 2006 1
28 c 2007 1
29 c 2008 1
30 c 2009 1
31 d 2000 0
32 d 2001 0
33 d 2002 1
34 d 2003 1
35 d 2004 1
36 d 2005 1
37 d 2006 0
38 d 2007 0
39 d 2008 0
40 d 2009 0
分析单位为ID。这些 ID 在数据中每年都会重复。 variable of interest
列表示对 ID 的更改,其中有些年份是 0,有些年份是 1
我想在 Variable_of_Interest
中创建一个额外的列来编码变化前后的变化(定义为从 0 到 1),同时忽略从(1 到 0)的变化(如 ID 等于 "d" 时所见)。
任何可以帮助我实现此解决方案的代码都将不胜感激!
Perferability 我希望数据看起来像这样:
ID Year Variable_of_Interest Solution
1 a 2000 0 -5
2 a 2001 0 -4
3 a 2002 0 -3
4 a 2003 0 -2
5 a 2004 0 -1
6 a 2005 1 0
7 a 2006 1 1
8 a 2007 1 2
9 a 2008 1 3
10 a 2009 1 4
11 b 2000 0 -3
12 b 2001 0 -2
13 b 2002 0 -1
14 b 2003 1 0
15 b 2004 1 1
16 b 2005 1 2
17 b 2006 1 3
18 b 2007 1 4
19 b 2008 1 5
20 b 2009 1 6
21 c 2000 0 -6
22 c 2001 0 -5
23 c 2002 0 -4
24 c 2003 0 -3
25 c 2004 0 -2
26 c 2005 0 -1
27 c 2006 1 0
28 c 2007 1 1
29 c 2008 1 2
30 c 2009 1 3
31 d 2000 0 -2
32 d 2001 0 -1
33 d 2002 1 0
34 d 2003 1 1
35 d 2004 1 2
36 d 2005 1 3
37 d 2006 0 NA
38 d 2007 0 NA
39 d 2008 0 NA
40 d 2009 0 NA
复制代码如下:
ID <- c(rep("a",10), rep("b", 10), rep("c", 10), rep("d", 10)); length(ID)
Year <- rep(seq(2000,2009, 1), 4)
Variable_of_Interest <- c(rep(0,5), rep(1, 5),
rep(0,3), rep(1, 7),
rep(0,6), rep(1, 4),
rep(0,2), rep(1, 4), rep(0,4))
data.frame(ID, Year, Variable_of_Interest)
感谢您的帮助!
我们可以创建一个函数:
library(dplyr)
get_sequence <- function(x) {
inds <- which(x == 1 & lag(x) == 0)[1]
vals <- seq_along(x) - inds
inds <- which(x == 0 & lag(x) == 1)[1]
if(!is.na(inds)) vals[inds:length(x)] <- NA
return(vals)
}
并将其应用于每个 ID
:
df %>% group_by(ID) %>% mutate(Solution = get_sequence(Variable_of_Interest))
# ID Year Variable_of_Interest Solution
#1 a 2000 0 -5
#2 a 2001 0 -4
#3 a 2002 0 -3
#4 a 2003 0 -2
#5 a 2004 0 -1
#6 a 2005 1 0
#7 a 2006 1 1
#8 a 2007 1 2
#9 a 2008 1 3
#10 a 2009 1 4
#11 b 2000 0 -3
#...
#...
#33 d 2002 1 0
#34 d 2003 1 1
#35 d 2004 1 2
#36 d 2005 1 3
#37 d 2006 0 NA
#38 d 2007 0 NA
#39 d 2008 0 NA
#40 d 2009 0 NA
我正在处理如下所示的数据:
ID Year Variable_of_Interest
1 a 2000 0
2 a 2001 0
3 a 2002 0
4 a 2003 0
5 a 2004 0
6 a 2005 1
7 a 2006 1
8 a 2007 1
9 a 2008 1
10 a 2009 1
11 b 2000 0
12 b 2001 0
13 b 2002 0
14 b 2003 1
15 b 2004 1
16 b 2005 1
17 b 2006 1
18 b 2007 1
19 b 2008 1
20 b 2009 1
21 c 2000 0
22 c 2001 0
23 c 2002 0
24 c 2003 0
25 c 2004 0
26 c 2005 0
27 c 2006 1
28 c 2007 1
29 c 2008 1
30 c 2009 1
31 d 2000 0
32 d 2001 0
33 d 2002 1
34 d 2003 1
35 d 2004 1
36 d 2005 1
37 d 2006 0
38 d 2007 0
39 d 2008 0
40 d 2009 0
分析单位为ID。这些 ID 在数据中每年都会重复。 variable of interest
列表示对 ID 的更改,其中有些年份是 0,有些年份是 1
我想在 Variable_of_Interest
中创建一个额外的列来编码变化前后的变化(定义为从 0 到 1),同时忽略从(1 到 0)的变化(如 ID 等于 "d" 时所见)。
任何可以帮助我实现此解决方案的代码都将不胜感激!
Perferability 我希望数据看起来像这样:
ID Year Variable_of_Interest Solution
1 a 2000 0 -5
2 a 2001 0 -4
3 a 2002 0 -3
4 a 2003 0 -2
5 a 2004 0 -1
6 a 2005 1 0
7 a 2006 1 1
8 a 2007 1 2
9 a 2008 1 3
10 a 2009 1 4
11 b 2000 0 -3
12 b 2001 0 -2
13 b 2002 0 -1
14 b 2003 1 0
15 b 2004 1 1
16 b 2005 1 2
17 b 2006 1 3
18 b 2007 1 4
19 b 2008 1 5
20 b 2009 1 6
21 c 2000 0 -6
22 c 2001 0 -5
23 c 2002 0 -4
24 c 2003 0 -3
25 c 2004 0 -2
26 c 2005 0 -1
27 c 2006 1 0
28 c 2007 1 1
29 c 2008 1 2
30 c 2009 1 3
31 d 2000 0 -2
32 d 2001 0 -1
33 d 2002 1 0
34 d 2003 1 1
35 d 2004 1 2
36 d 2005 1 3
37 d 2006 0 NA
38 d 2007 0 NA
39 d 2008 0 NA
40 d 2009 0 NA
复制代码如下:
ID <- c(rep("a",10), rep("b", 10), rep("c", 10), rep("d", 10)); length(ID)
Year <- rep(seq(2000,2009, 1), 4)
Variable_of_Interest <- c(rep(0,5), rep(1, 5),
rep(0,3), rep(1, 7),
rep(0,6), rep(1, 4),
rep(0,2), rep(1, 4), rep(0,4))
data.frame(ID, Year, Variable_of_Interest)
感谢您的帮助!
我们可以创建一个函数:
library(dplyr)
get_sequence <- function(x) {
inds <- which(x == 1 & lag(x) == 0)[1]
vals <- seq_along(x) - inds
inds <- which(x == 0 & lag(x) == 1)[1]
if(!is.na(inds)) vals[inds:length(x)] <- NA
return(vals)
}
并将其应用于每个 ID
:
df %>% group_by(ID) %>% mutate(Solution = get_sequence(Variable_of_Interest))
# ID Year Variable_of_Interest Solution
#1 a 2000 0 -5
#2 a 2001 0 -4
#3 a 2002 0 -3
#4 a 2003 0 -2
#5 a 2004 0 -1
#6 a 2005 1 0
#7 a 2006 1 1
#8 a 2007 1 2
#9 a 2008 1 3
#10 a 2009 1 4
#11 b 2000 0 -3
#...
#...
#33 d 2002 1 0
#34 d 2003 1 1
#35 d 2004 1 2
#36 d 2005 1 3
#37 d 2006 0 NA
#38 d 2007 0 NA
#39 d 2008 0 NA
#40 d 2009 0 NA