创建具有大小写匹配的分类 variable/data 子集

Create categorical variable/data subset with case matching

我有这样的数据集:

structure(list(year = c(2019, 2019, 2019, 2019, 2019, 2019), 
    venue = c("Z", "Z", "Z", "Z", "O", "D"), HO = c("X", "Y", 
    "X", "Y", "W", "J"), AW = c("Y", "X", "W", "T", "T", "X")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

我需要创建一个returns“是”或“否”的分类变量。

structure(list(year = c(2019, 2019, 2019, 2019, 2019, 2019), 
    venue = c("Z", "Z", "Z", "Z", "O", "D"), HO = c("X", "Y", 
    "X", "Y", "W", "J"), AW = c("Y", "X", "W", "T", "T", "X"), 
    Cat = c("yes", "yes", "no", "no", "no", "no")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

请注意,前两行是“是”的情况。这些在属性“year”和“venue”中是相同的,无论“HO”和“AW”中的顺序如何。

谢谢。

您可以先使用 rowwise() 按行对数据框进行分组,然后 paste HOAW 列组合在一起。这一步是看哪些行有相同的HOAW,不分顺序。然后group_by独特的组合对它们进行分类。

library(dplyr)
library(stringr)

df %>% 
  rowwise() %>% 
  mutate(paste_col = paste0(sort(str_split(paste0(HO, AW), "", simplify = T)), collapse = ",")) %>% 
  group_by(year, venue, paste_col) %>% 
  mutate(Cat = ifelse(n() > 1, "Yes", "No")) %>% 
  ungroup() %>% 
  select(-paste_col)

# A tibble: 6 × 5
   year venue HO    AW    Cat  
  <dbl> <chr> <chr> <chr> <chr>
1  2019 Z     X     Y     Yes  
2  2019 Z     Y     X     Yes  
3  2019 Z     X     W     No   
4  2019 Z     Y     T     No   
5  2019 O     W     T     No   
6  2019 D     J     X     No