dplyr中变量之间的条件匹配
conditional matching between variables in dplyr
我试图在一列中查找在另一列中具有某些或所有可能值的观察值。在这篇文章中
parties <- tibble(class = c("R","R","R","R","R","K","K","K","K","K","K",
"L","L","L","L"),
name = c("Party1", "Party2","Party3","Party4","Party5",
"Party2", "Party4", "Party6","Party7","Party8","Party9",
"Party2","Party3","Party4","Party10"))
我想找到所有三个 classes“R”、“K”和“L”中的所有“派对”。或者通常是 class“X”或“Y”中的政党。我设法找到了一个解决方案,使用 group_split(class)
,然后从列表中提取每个 table,最后执行两个 semi_joins。这是针对我想要所有三个 classes
:
中的派对的情况
parties_split <- parties %>%
group_split(class)
parties_K <- parties_split[[1]]
parties_L <- parties_split[[2]]
parties_R <- parties_split[[3]]
semi_join(parties_K,parties_L, by = "name") %>%
semi_join(parties_R, by = "name") %>%
select(-class)
name
<chr>
Party2
Party4
这在这种情况下可行,但效率不高,尤其是当需要匹配的 classes(或观察值)的数量远大于三个时。我特别在寻找 tidyverse 中的解决方案。有任何想法吗?谢谢
这个有用吗:
library(dplyr)
parties %>% group_by(name) %>% mutate(cnt = n()) %>%
group_by(class) %>% mutate(grpno = group_indices()) %>% ungroup() %>%
filter(cnt >= max(grpno)) %>% select(name) %>% distinct()
# A tibble: 2 x 1
name
<chr>
1 Party2
2 Party4
试试看:
parties %>%
group_by(name) %>%
filter("K" %in% class,
"R" %in% class,
"L" %in% class) %>%
summarise()
# A tibble: 2 x 1
name
<chr>
1 Party2
2 Party4
编辑:如果您想与超过 3 个参与方合作,您还可以使用:
mask = c("K", "R", "L")
parties %>%
group_by(name) %>%
filter(all(mask %in% class)) %>%
summarise()
要使此功能适用于许多组,您可以使用 purrr::reduce
:
library(dplyr)
parties %>%
group_split(class) %>%
purrr::reduce(semi_join, by = "name") %>%
select(name)
# name
# <chr>
#1 Party2
#2 Party4
另一个解决方案
library(tidyverse)
parties %>%
group_by(class) %>%
distinct() %>%
mutate(id = 1) %>%
pivot_wider(name, names_from = class, values_from = id) %>%
rowwise() %>%
filter(!is.na(sum(c_across(where(is.numeric))))) %>%
select(name) %>%
ungroup()
#> # A tibble: 2 x 1
#> name
#> <chr>
#> 1 Party2
#> 2 Party4
由 reprex package (v0.3.0)
于 2020-12-09 创建
我试图在一列中查找在另一列中具有某些或所有可能值的观察值。在这篇文章中
parties <- tibble(class = c("R","R","R","R","R","K","K","K","K","K","K",
"L","L","L","L"),
name = c("Party1", "Party2","Party3","Party4","Party5",
"Party2", "Party4", "Party6","Party7","Party8","Party9",
"Party2","Party3","Party4","Party10"))
我想找到所有三个 classes“R”、“K”和“L”中的所有“派对”。或者通常是 class“X”或“Y”中的政党。我设法找到了一个解决方案,使用 group_split(class)
,然后从列表中提取每个 table,最后执行两个 semi_joins。这是针对我想要所有三个 classes
:
parties_split <- parties %>%
group_split(class)
parties_K <- parties_split[[1]]
parties_L <- parties_split[[2]]
parties_R <- parties_split[[3]]
semi_join(parties_K,parties_L, by = "name") %>%
semi_join(parties_R, by = "name") %>%
select(-class)
name
<chr>
Party2
Party4
这在这种情况下可行,但效率不高,尤其是当需要匹配的 classes(或观察值)的数量远大于三个时。我特别在寻找 tidyverse 中的解决方案。有任何想法吗?谢谢
这个有用吗:
library(dplyr)
parties %>% group_by(name) %>% mutate(cnt = n()) %>%
group_by(class) %>% mutate(grpno = group_indices()) %>% ungroup() %>%
filter(cnt >= max(grpno)) %>% select(name) %>% distinct()
# A tibble: 2 x 1
name
<chr>
1 Party2
2 Party4
试试看:
parties %>%
group_by(name) %>%
filter("K" %in% class,
"R" %in% class,
"L" %in% class) %>%
summarise()
# A tibble: 2 x 1
name
<chr>
1 Party2
2 Party4
编辑:如果您想与超过 3 个参与方合作,您还可以使用:
mask = c("K", "R", "L")
parties %>%
group_by(name) %>%
filter(all(mask %in% class)) %>%
summarise()
要使此功能适用于许多组,您可以使用 purrr::reduce
:
library(dplyr)
parties %>%
group_split(class) %>%
purrr::reduce(semi_join, by = "name") %>%
select(name)
# name
# <chr>
#1 Party2
#2 Party4
另一个解决方案
library(tidyverse)
parties %>%
group_by(class) %>%
distinct() %>%
mutate(id = 1) %>%
pivot_wider(name, names_from = class, values_from = id) %>%
rowwise() %>%
filter(!is.na(sum(c_across(where(is.numeric))))) %>%
select(name) %>%
ungroup()
#> # A tibble: 2 x 1
#> name
#> <chr>
#> 1 Party2
#> 2 Party4
由 reprex package (v0.3.0)
于 2020-12-09 创建