由于未指定面板数据,因此使用受访者编号选择面板数据

Selecting panel data using respondent nummer because the panel data is unspecified

我有一个部分是面板数据的数据框,看起来像这样:

respnr country country-year year     a     b
1      France  France2000   2000       NA    NA 
3      France  France2001   2001     1000  1000  
2      France  France2002   2002       NA    NA
2      France  France2003   2003     1600  2200
3      France  France2004   2004       NA    NA
6      UK          UK2000   2000     1000  1000  
6      UK          UK2001   2001       NA    NA
8      UK          UK2002   2002     1000  1000  
9      UK          UK2003   2003       NA    NA
6      UK          UK2004   2004       NA    NA
11     Germany     UK2000   2000       NA    NA 
11     Germany     UK2001   2001       NA    NA
12     Germany     UK2002   2002       NA    NA  
14     Germany     UK2003   2003       NA    NA
12     Germany     UK2004   2004       NA    NA

我尝试使用受访者编号提取面板数据如下:

df$panel <- duplicated(df$respnr)
dfp<- subset(df, df$panel == TRUE)

但我意识到这只会提取一个受访者编号实例,因此不会创建面板数据。

预期输出:

respnr country country-year year     a     b
3      France  France2001   2001     1000  1000  
2      France  France2002   2002       NA    NA
2      France  France2003   2003     1600  2200
3      France  France2004   2004       NA    NA
6      UK          UK2000   2000     1000  1000  
6      UK          UK2001   2001       NA    NA
6      UK          UK2004   2004       NA    NA
11     Germany     UK2000   2000       NA    NA 
11     Germany     UK2001   2001       NA    NA
12     Germany     UK2002   2002       NA    NA  
12     Germany     UK2004   2004       NA    NA

有什么解决办法吗?

我们可以使用table

subset(df, df$respnr %in% names(table(df$respnr))[table(df$respnr) >= 2])
#   respnr country country.year year    a    b
#2       3  France   France2001 2001 1000 1000
#3       2  France   France2002 2002   NA   NA
#4       2  France   France2003 2003 1600 2200
#5       3  France   France2004 2004   NA   NA
#6       6      UK       UK2000 2000 1000 1000
#7       6      UK       UK2001 2001   NA   NA
#10      6      UK       UK2004 2004   NA   NA
#11     11 Germany       UK2000 2000   NA   NA
#12     11 Germany       UK2001 2001   NA   NA
#13     12 Germany       UK2002 2002   NA   NA
#15     12 Germany       UK2004 2004   NA   NA

table(df$respnr) returns 命名向量

# 1  2  3  6  8  9 11 12 14 
# 1  2  2  3  1  1  2  2  1

OP 只想保留 2 个(或更多?)个观察结果,以便我们过滤这些值

names(table(df$respnr))[table(df$respnr) >= 2]
#[1] "2"  "3"  "6"  "11" "12"

最后创建一个逻辑向量到 subset 数据:

df$respnr %in% names(table(df$respnr))[table(df$respnr) >= 2]

dplyr中:

library(dplyr)
df <- df %>% 
       group_by(respnr) %>%
       #drops any group which only has one observation
       filter(n() != 1)