计算具有条件(虚拟)面板数据的个人数量

Count number of individuals with a condition (dummy) paneled data

由于隐私问题,我无法共享原始数据集或我的原始代码。因此,我创建了一个示例。

假设我想统计有多少人获得了高等教育学位。这意味着我想知道有多少人 HEdummy == 0。我正在努力解决这个问题......在下面的示例中,正确答案是 2。我试图创建一个 table 并使用 count/unique 函数,但我不知道如何在不对所有“1”求和的情况下区分个体。

df <- data.frame (Individual  = c("1", "1", "1","1","2","2","2","3","4","4",'4',"4"),
                  Time = c("2011", "2012", "2013","2014","2011","2012","2012","2017","2014","2015",'2016',"2017"),
                  HigherEducationDummy = c("1", "1", "1","1","0","0","0","1","0","0",'0',"0"))

不确定为什么答案是 0,但根据其余描述,您似乎可以对每个人多年来的情况进行总结。

library(dplyr)

df %>% 
  group_by(Individual) %>% 
  summarize(hasHE = !any(HigherEducationDummy == "1")) %>%
  select(hasHE) %>% 
  sum()

这会告诉你有多少人多年来从未接受过高等教育。您还可以将 sum 替换为 table 以获得所有类别的计数。

tapplysum中使用all。这计算了所有年份中有多少人的虚拟变量为零。

sum(with(df, tapply(HigherEducationDummy == 0, Individual, all)))
# [1] 2