使用 dplyr 过滤数据框
Filter a dataframe with dplyr
我有这个data.frame:
df <- data.frame(
id = c("x1", "x2", "x3", "x4", "x5", "x1", "x2", "x6", "x7", "x8", "x7", "x8" ),
age = c(rep("juvenile", 5), rep("adult", 7))
)
df
id age
1 x1 juvenile
2 x2 juvenile
3 x3 juvenile
4 x4 juvenile
5 x5 juvenile
6 x1 adult
7 x2 adult
8 x6 adult
9 x7 adult
10 x8 adult
11 x7 adult
12 x8 adult
每一行代表一个人。我想拉出所有将青少年再次视为成年人的行。我不希望那些最初被视为成年人的个体再次被视为成年人的行(ids x7 和 x8)。所以结果 data.frame 应该是这样的:
id age
1 x1 juvenile
2 x2 juvenile
3 x1 adult
4 x2 adult
我特别想要 dplyr
解决方案。
嘿,我认为这就是您要查找的内容...将其分解以进行说明,但我相信您可以通过不重新分配过滤器参数的结果来使其更加紧凑。
kids <- df %>%
filter(age == "juvenile")
adults <- df %>%
filter(age == "adult")
repeat_offender<-inner_join(kids,adults, by = "id")
repeat_offender
实际上 return 要求的答案...
this_solution_sucks<-gather(repeat_offender, agex, age, -id) %>% select(-agex)
您只能按 id
和 select 分组,只有那些同时包含 'juvenile' 和 'adult' 的组:
df %>%
group_by(id) %>%
filter(all(c('juvenile','adult') %in% age))
#Source: local data frame [4 x 2]
#Groups: id
#
# id age
#1 x1 juvenile
#2 x2 juvenile
#3 x1 adult
#4 x2 adult
这是一个使用 dplyr
的解决方案,在寻找更具体的阈值时可能会很有用:
df %>%
group_by(id) %>%
filter(sum(age == 'juvenile') >= 1 & sum(age == 'adult') >= 1)
# Source: local data frame [4 x 2]
# Groups: id
#
# id age
# 1 x1 juvenile
# 2 x2 juvenile
# 3 x1 adult
# 4 x2 adult
我有这个data.frame:
df <- data.frame(
id = c("x1", "x2", "x3", "x4", "x5", "x1", "x2", "x6", "x7", "x8", "x7", "x8" ),
age = c(rep("juvenile", 5), rep("adult", 7))
)
df
id age
1 x1 juvenile
2 x2 juvenile
3 x3 juvenile
4 x4 juvenile
5 x5 juvenile
6 x1 adult
7 x2 adult
8 x6 adult
9 x7 adult
10 x8 adult
11 x7 adult
12 x8 adult
每一行代表一个人。我想拉出所有将青少年再次视为成年人的行。我不希望那些最初被视为成年人的个体再次被视为成年人的行(ids x7 和 x8)。所以结果 data.frame 应该是这样的:
id age
1 x1 juvenile
2 x2 juvenile
3 x1 adult
4 x2 adult
我特别想要 dplyr
解决方案。
嘿,我认为这就是您要查找的内容...将其分解以进行说明,但我相信您可以通过不重新分配过滤器参数的结果来使其更加紧凑。
kids <- df %>%
filter(age == "juvenile")
adults <- df %>%
filter(age == "adult")
repeat_offender<-inner_join(kids,adults, by = "id")
repeat_offender
实际上 return 要求的答案...
this_solution_sucks<-gather(repeat_offender, agex, age, -id) %>% select(-agex)
您只能按 id
和 select 分组,只有那些同时包含 'juvenile' 和 'adult' 的组:
df %>%
group_by(id) %>%
filter(all(c('juvenile','adult') %in% age))
#Source: local data frame [4 x 2]
#Groups: id
#
# id age
#1 x1 juvenile
#2 x2 juvenile
#3 x1 adult
#4 x2 adult
这是一个使用 dplyr
的解决方案,在寻找更具体的阈值时可能会很有用:
df %>%
group_by(id) %>%
filter(sum(age == 'juvenile') >= 1 & sum(age == 'adult') >= 1)
# Source: local data frame [4 x 2]
# Groups: id
#
# id age
# 1 x1 juvenile
# 2 x2 juvenile
# 3 x1 adult
# 4 x2 adult