由于未指定面板数据,因此使用受访者编号选择面板数据
Selecting panel data using respondent nummer because the panel data is unspecified
我有一个部分是面板数据的数据框,看起来像这样:
respnr country country-year year a b
1 France France2000 2000 NA NA
3 France France2001 2001 1000 1000
2 France France2002 2002 NA NA
2 France France2003 2003 1600 2200
3 France France2004 2004 NA NA
6 UK UK2000 2000 1000 1000
6 UK UK2001 2001 NA NA
8 UK UK2002 2002 1000 1000
9 UK UK2003 2003 NA NA
6 UK UK2004 2004 NA NA
11 Germany UK2000 2000 NA NA
11 Germany UK2001 2001 NA NA
12 Germany UK2002 2002 NA NA
14 Germany UK2003 2003 NA NA
12 Germany UK2004 2004 NA NA
我尝试使用受访者编号提取面板数据如下:
df$panel <- duplicated(df$respnr)
dfp<- subset(df, df$panel == TRUE)
但我意识到这只会提取一个受访者编号实例,因此不会创建面板数据。
预期输出:
respnr country country-year year a b
3 France France2001 2001 1000 1000
2 France France2002 2002 NA NA
2 France France2003 2003 1600 2200
3 France France2004 2004 NA NA
6 UK UK2000 2000 1000 1000
6 UK UK2001 2001 NA NA
6 UK UK2004 2004 NA NA
11 Germany UK2000 2000 NA NA
11 Germany UK2001 2001 NA NA
12 Germany UK2002 2002 NA NA
12 Germany UK2004 2004 NA NA
有什么解决办法吗?
我们可以使用table
subset(df, df$respnr %in% names(table(df$respnr))[table(df$respnr) >= 2])
# respnr country country.year year a b
#2 3 France France2001 2001 1000 1000
#3 2 France France2002 2002 NA NA
#4 2 France France2003 2003 1600 2200
#5 3 France France2004 2004 NA NA
#6 6 UK UK2000 2000 1000 1000
#7 6 UK UK2001 2001 NA NA
#10 6 UK UK2004 2004 NA NA
#11 11 Germany UK2000 2000 NA NA
#12 11 Germany UK2001 2001 NA NA
#13 12 Germany UK2002 2002 NA NA
#15 12 Germany UK2004 2004 NA NA
table(df$respnr)
returns 命名向量
# 1 2 3 6 8 9 11 12 14
# 1 2 2 3 1 1 2 2 1
OP 只想保留 2 个(或更多?)个观察结果,以便我们过滤这些值
names(table(df$respnr))[table(df$respnr) >= 2]
#[1] "2" "3" "6" "11" "12"
最后创建一个逻辑向量到 subset
数据:
df$respnr %in% names(table(df$respnr))[table(df$respnr) >= 2]
在dplyr
中:
library(dplyr)
df <- df %>%
group_by(respnr) %>%
#drops any group which only has one observation
filter(n() != 1)
我有一个部分是面板数据的数据框,看起来像这样:
respnr country country-year year a b
1 France France2000 2000 NA NA
3 France France2001 2001 1000 1000
2 France France2002 2002 NA NA
2 France France2003 2003 1600 2200
3 France France2004 2004 NA NA
6 UK UK2000 2000 1000 1000
6 UK UK2001 2001 NA NA
8 UK UK2002 2002 1000 1000
9 UK UK2003 2003 NA NA
6 UK UK2004 2004 NA NA
11 Germany UK2000 2000 NA NA
11 Germany UK2001 2001 NA NA
12 Germany UK2002 2002 NA NA
14 Germany UK2003 2003 NA NA
12 Germany UK2004 2004 NA NA
我尝试使用受访者编号提取面板数据如下:
df$panel <- duplicated(df$respnr)
dfp<- subset(df, df$panel == TRUE)
但我意识到这只会提取一个受访者编号实例,因此不会创建面板数据。
预期输出:
respnr country country-year year a b
3 France France2001 2001 1000 1000
2 France France2002 2002 NA NA
2 France France2003 2003 1600 2200
3 France France2004 2004 NA NA
6 UK UK2000 2000 1000 1000
6 UK UK2001 2001 NA NA
6 UK UK2004 2004 NA NA
11 Germany UK2000 2000 NA NA
11 Germany UK2001 2001 NA NA
12 Germany UK2002 2002 NA NA
12 Germany UK2004 2004 NA NA
有什么解决办法吗?
我们可以使用table
subset(df, df$respnr %in% names(table(df$respnr))[table(df$respnr) >= 2])
# respnr country country.year year a b
#2 3 France France2001 2001 1000 1000
#3 2 France France2002 2002 NA NA
#4 2 France France2003 2003 1600 2200
#5 3 France France2004 2004 NA NA
#6 6 UK UK2000 2000 1000 1000
#7 6 UK UK2001 2001 NA NA
#10 6 UK UK2004 2004 NA NA
#11 11 Germany UK2000 2000 NA NA
#12 11 Germany UK2001 2001 NA NA
#13 12 Germany UK2002 2002 NA NA
#15 12 Germany UK2004 2004 NA NA
table(df$respnr)
returns 命名向量
# 1 2 3 6 8 9 11 12 14
# 1 2 2 3 1 1 2 2 1
OP 只想保留 2 个(或更多?)个观察结果,以便我们过滤这些值
names(table(df$respnr))[table(df$respnr) >= 2]
#[1] "2" "3" "6" "11" "12"
最后创建一个逻辑向量到 subset
数据:
df$respnr %in% names(table(df$respnr))[table(df$respnr) >= 2]
在dplyr
中:
library(dplyr)
df <- df %>%
group_by(respnr) %>%
#drops any group which only has one observation
filter(n() != 1)