如何在 r 的 df 中找到虚拟变量从 1 变为 0（而不是反之亦然）的观测值

Question

我有一个由n个人组成的调查；每个人在调查中出现不止一次（小组）。我有一个可变笔，它是一个虚拟变量，如果个人投资补充养老金形式，它的值为 1。例如：

df <- data.frame(year=c(2002,2002,2004,2004,2006,2008), id=c(1,2,1,2,3,3), y.b=c(1950,1943,1950,1943,1966,1966), sex=c("F", "M", "F", "M", "M", "M"), income=c(100000,55000,88000,66000,12000,24000), pens=c(0,1,1,0,1,1))

year  id  y.b   sex   income   pens   
2002  1   1950   F    100000     0     
2002  2   1943   M    55000      1    
2004  1   1950   F    88000      1    
2004  2   1943   M    66000      0    
2006  3   1966   M    12000      1    
2008  3   1966   M    24000      1

其中id是个人，y.b是出生年份，pens是关于补充养老金的虚拟变量。

我想知道是否有个人在 t 年投资了补充养老金，但在 t+2 年没有持有补充养老金（该调查每两年进行一次）。通过这种方式我想知道有多少人有补充养老金形式但在养老金之前释放或放弃（例如出于经济原因）。

我试过这个命令：

df$x <- (ave(df$pens, df$id, FUN = function(x)length(unique(x)))==1)*1
which(df$x=="0")

实际上我有一些人的笔变量在一段时间内发生了变化（命令检查变量是否及时保持不变）。出于这个原因，我发现 pens 变量从 t 年的 0（没有补充养老金）变为 t+2 年的 1，反之亦然；但我对 pens 变量在 t 年为 1（有补充 pensione）在 t+2 年为 0 的个人感兴趣。

如果我将此命令与 df 一起使用，我会得到 id 1 和 2 的变量 x 为 0（pens 变量不是常量），但我需要找到一种方法来仅获取 id 2（其 pens 变量从 1 变为 0）。

df$x <- (ave(df$pens, df$id, FUN = function(x)length(unique(x)))==1)*1
which(df$x=="0")

  year id pens x
1 2002  1    0 0
2 2002  2    1 0
3 2004  1    1 0
4 2004  2    0 0
5 2006  3    1 1
6 2008  3    1 1

（为了简洁我省略了其他变量）

所以期望的输出是：

  year id pens x
1 2002  1    0 1
2 2002  2    1 0
3 2004  1    1 1
4 2004  2    0 0
5 2006  3    1 1
6 2008  3    1 1

只有 id 2 的 x=0，因为 pens 变量从 1 变为 0。

提前致谢

Answer 1

这会将 1 分配给 pens 下降的 ID，否则分配 0。

transform(d.d, x = ave(pens, id, FUN = function(x) any(diff(x) < 0)))

给予：

  year id  y.b sex income pens   x
1 2002  1 1950   F 100000    0   0
2 2002  2 1943   M  55000    1   1
3 2004  1 1950   F  88000    1   0
4 2004  2 1943   M  66000    0   1
5 2006  3 1966   M  12000    1   0
6 2008  3 1966   M  24000    1   0

即使每个 id 有超过 2 行，这也应该有效，但如果我们知道总是有 2 行，那么我们可以省略 any 将其简化为：

transform(d.d, x = ave(pens, id, FUN = diff) < 0)

注意：可重现形式的输入是：

Lines <- "year  id  y.b   sex   income   pens   
2002  1   1950   F    100000     0     
2002  2   1943   M    55000      1    
2004  1   1950   F    88000      1    
2004  2   1943   M    66000      0    
2006  3   1966   M    12000      1    
2008  3   1966   M    24000      1"

d.d <- read.table(text = Lines, header = TRUE, check.names = FALSE)

如何在 r 的 df 中找到虚拟变量从 1 变为 0（而不是反之亦然）的观测值

How to find observations whose dummy variable changes from 1 to 0 (and not viceversa) in a df in r

time

r

dummy-variable