过滤组以包含 dplyr 的按列条件

Filter group to include column-wise condition with dplyr

我想对 dplyr 中的某些分组数据进行子集化或过滤,以仅包含具有 2 个不同级别的分类数据的组。我的数据如下所示:

而且我希望我的输出仅包括 health_facility,它们的季节列中同时包含“疟疾”和“非疟疾”。

我试过了

multi_hf %>%
group_by(health_facility) %>%
filter(season == "malaria" & season == "non-malaria") 

但是我得到的只有 NA 值。

非常感谢任何帮助!数据:

structure(list(season = c("malaria", "malaria", "malaria", "malaria", 
"malaria", "malaria", "malaria", "malaria", "malaria", "malaria", 
"malaria", "malaria", "malaria", "malaria", "malaria", "malaria", 
"malaria", "malaria", "malaria", "malaria", "malaria", "malaria", 
"malaria", "malaria", "malaria", "malaria", "non-malaria", "non-malaria", 
"non-malaria", "non-malaria", "non-malaria", "non-malaria", "non-malaria", 
"non-malaria", "non-malaria", "non-malaria", "non-malaria", "non-malaria", 
"non-malaria", "non-malaria", "non-malaria", "non-malaria", "non-malaria", 
"non-malaria", "non-malaria", "non-malaria", "non-malaria", "non-malaria", 
"non-malaria", "non-malaria", "non-malaria", "non-malaria", "non-malaria", 
"non-malaria", "non-malaria", "non-malaria", "non-malaria"), 
    health_facility = c("Hospital Agostinho Neto", "Hospital Baptista de Sousa", 
    "Health Delegation São Miguel", "Health Center Chã de Alecrim", 
    "Health Center Fonte Inês", "Health Delegation Maio", "Health Delegation Sao Vincente", 
    "Health Delegation Sao Vincente", "Hospital Ribeira Grande", 
    "Health Delegation Ribeira Brava", "Health Delegation Santa Cruz", 
    "Health Delegation Paul", "Center Delegation Santa Catarina", 
    "Regional Hospital Fogo e Brava", "Health Delegation São Filipe", 
    "Health Center Cidade Velha", "Health Delegation Tarrafal Santiago", 
    "Health Delegation Tarrafal Santiago", "Health Delegation Tarrafal Santiago", 
    "Health Center Sao Salvador do Mundo – Picos", "Health Delegation Tarrafal Santiago", 
    "Health Delegation São Lourenço dos Orgaos", "Health Delegation Ribeira Grande", 
    "Health Delegation of Praia", "Center Delegation Santa Catarina", 
    "Regional Hospital Santiago Norte", "Health Delegation Ribeira Brava", 
    "Health Delegation Ribeira Brava", "Hospital Baptista de Sousa", 
    "Health Delegation Paul", "Health Delegation Ribeira Brava", 
    "Health Center Sao Salvador do Mundo – Picos", "Health Delegation Sao Vincente", 
    "Health Delegation São Miguel", "Health Delegation Tarrafal Santiago", 
    "Regional Hospital Santiago Norte", "Regional Hospital Santiago Norte", 
    "Regional Hospital Santiago Norte", "Regional Hospital Santiago Norte", 
    "Health Delegation Sao Vincente", "Regional Hospital Fogo e Brava", 
    "Center Delegation Santa Catarina", "Health Center Chã de Alecrim", 
    "Hospital Agostinho Neto", "Hospital Ribeira Grande", "Health Delegation São Lourenço dos Orgaos", 
    "Health Delegation São Lourenço dos Orgaos", "Health Delegation São Filipe", 
    "Health Center Fonte Inês", "Hospital Agostinho Neto", "Regional Hospital Fogo e Brava", 
    "Health Delegation of Praia", "Health Delegation Maio", "Health Delegation Ribeira Grande", 
    "Health Delegation São Lourenço dos Orgaos", "Health Delegation Santa Cruz", 
    "Health Center Cidade Velha")), class = c("data.table", "data.frame"
), row.names = c(NA, -57L), .internal.selfref = <pointer: 0x0000017c5a4b1ef0>)

filter(season == "malaria" & season == "non-malaria") 表示 select 行同时具有“疟疾”和“non-malaria”,这是不可能的,因为一行只能有一个值。这就是为什么您在共享的示例数据中得到 0 行。示例数据的输出中没有 NA 行,但这是因为它不包含示例数据中的任何 NA 值。与 == 比较时返回 NA 值,如果您使用 %in% 应该会有帮助。

所以您可能想要 select 一个 health_facility ,它具有可以作为 -

完成的两个值
library(dplyr)

multi_hf %>%
  arrange(health_facility) %>%
  group_by(health_facility) %>%
  filter(all(c("malaria", "non-malaria") %in% season)) %>%
  ungroup()

我个人更喜欢更清洁的解决方案。使用 n_distinct 非常适合这里:

df %>%
  group_by(health_facility) %>%
  filter(n_distinct(season) == 2) %>%
  ungroup()