建立一个平衡的面板,保留每年重复的观察结果
Build a balanced panel keeping observations that are repeated every year
我只想保留多年来的完整观察结果,我该如何进行?
我有以下例子:
structure(list(variable = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5,
5, 5), Year = c(2010, 2011, 2012, 2010, 2012, 2010, 2011, 2012,
2011, 2012, 2010, 2011, 2012)), class = "data.frame", row.names = c(NA,
-13L))
我想得到:
structure(list(variable = c(1, 1, 1, 3, 3, 3, 5, 5, 5), Year = c(2010,
2011, 2012, 2010, 2011, 2012, 2010, 2011, 2012)), row.names = c(1L,
2L, 3L, 6L, 7L, 8L, 11L, 12L, 13L), class = "data.frame")
这个例子很简单,但我需要为一个庞大的数据集做这个,以构建一个平衡的仪表板。感谢您的帮助。
在base R
中,我们可以使用subset
和table
yr <- unique(df$Year)
subset(df, variable %in% names(which(table(variable[Year %in% yr]) ==
length(yr))))
或与dplyr
,按'variable'分组,filter
那些具有不同'Year'(n_distinct
)个数的变量与整个变量相同数据
library(dplyr)
df %>%
group_by(variable) %>%
filter(n_distinct(Year) == n_distinct(.$Year)) %>%
ungroup
# A tibble: 9 x 2
variable Year
<dbl> <dbl>
1 1 2010
2 1 2011
3 1 2012
4 3 2010
5 3 2011
6 3 2012
7 5 2010
8 5 2011
9 5 2012
我只想保留多年来的完整观察结果,我该如何进行?
我有以下例子:
structure(list(variable = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5,
5, 5), Year = c(2010, 2011, 2012, 2010, 2012, 2010, 2011, 2012,
2011, 2012, 2010, 2011, 2012)), class = "data.frame", row.names = c(NA,
-13L))
我想得到:
structure(list(variable = c(1, 1, 1, 3, 3, 3, 5, 5, 5), Year = c(2010,
2011, 2012, 2010, 2011, 2012, 2010, 2011, 2012)), row.names = c(1L,
2L, 3L, 6L, 7L, 8L, 11L, 12L, 13L), class = "data.frame")
这个例子很简单,但我需要为一个庞大的数据集做这个,以构建一个平衡的仪表板。感谢您的帮助。
在base R
中,我们可以使用subset
和table
yr <- unique(df$Year)
subset(df, variable %in% names(which(table(variable[Year %in% yr]) ==
length(yr))))
或与dplyr
,按'variable'分组,filter
那些具有不同'Year'(n_distinct
)个数的变量与整个变量相同数据
library(dplyr)
df %>%
group_by(variable) %>%
filter(n_distinct(Year) == n_distinct(.$Year)) %>%
ungroup
# A tibble: 9 x 2
variable Year
<dbl> <dbl>
1 1 2010
2 1 2011
3 1 2012
4 3 2010
5 3 2011
6 3 2012
7 5 2010
8 5 2011
9 5 2012