计算具有条件(虚拟)面板数据的个人数量
Count number of individuals with a condition (dummy) paneled data
由于隐私问题,我无法共享原始数据集或我的原始代码。因此,我创建了一个示例。
假设我想统计有多少人获得了高等教育学位。这意味着我想知道有多少人 HEdummy == 0。我正在努力解决这个问题......在下面的示例中,正确答案是 2。我试图创建一个 table 并使用 count/unique 函数,但我不知道如何在不对所有“1”求和的情况下区分个体。
df <- data.frame (Individual = c("1", "1", "1","1","2","2","2","3","4","4",'4',"4"),
Time = c("2011", "2012", "2013","2014","2011","2012","2012","2017","2014","2015",'2016',"2017"),
HigherEducationDummy = c("1", "1", "1","1","0","0","0","1","0","0",'0',"0"))
不确定为什么答案是 0,但根据其余描述,您似乎可以对每个人多年来的情况进行总结。
library(dplyr)
df %>%
group_by(Individual) %>%
summarize(hasHE = !any(HigherEducationDummy == "1")) %>%
select(hasHE) %>%
sum()
这会告诉你有多少人多年来从未接受过高等教育。您还可以将 sum
替换为 table
以获得所有类别的计数。
在tapply
和sum
中使用all
。这计算了所有年份中有多少人的虚拟变量为零。
sum(with(df, tapply(HigherEducationDummy == 0, Individual, all)))
# [1] 2
由于隐私问题,我无法共享原始数据集或我的原始代码。因此,我创建了一个示例。
假设我想统计有多少人获得了高等教育学位。这意味着我想知道有多少人 HEdummy == 0。我正在努力解决这个问题......在下面的示例中,正确答案是 2。我试图创建一个 table 并使用 count/unique 函数,但我不知道如何在不对所有“1”求和的情况下区分个体。
df <- data.frame (Individual = c("1", "1", "1","1","2","2","2","3","4","4",'4',"4"),
Time = c("2011", "2012", "2013","2014","2011","2012","2012","2017","2014","2015",'2016',"2017"),
HigherEducationDummy = c("1", "1", "1","1","0","0","0","1","0","0",'0',"0"))
不确定为什么答案是 0,但根据其余描述,您似乎可以对每个人多年来的情况进行总结。
library(dplyr)
df %>%
group_by(Individual) %>%
summarize(hasHE = !any(HigherEducationDummy == "1")) %>%
select(hasHE) %>%
sum()
这会告诉你有多少人多年来从未接受过高等教育。您还可以将 sum
替换为 table
以获得所有类别的计数。
在tapply
和sum
中使用all
。这计算了所有年份中有多少人的虚拟变量为零。
sum(with(df, tapply(HigherEducationDummy == 0, Individual, all)))
# [1] 2