如何在 R 中使用 filter 和 str_detect 过滤具有部分匹配对的数据？

Question

我正在尝试筛选具有匹配组的数据，如果没有匹配组，我想删除这些观察结果。

例如，如果我有一个数据集：

#  condition   group     type
#1   apple_1       B    small
#2   apple_1       A    small
#3   apple 1       A    small
#4   apple_2       A    small
#5   apple_2       A    small
#6   apple_3       A    small
#7    pear_1       A    small
#8    pear_1       A    small
#9    pear_1       A    small
#10   pear_2       A    small
#11   pear_3       A    small

在这里我决定每个苹果观察必须根据它们的数量与每对观察配对（例如 apple_3 应该与 pear_3 配对）。所以我们可以看到，由于只有一个 pear_2 观察，因此应该删除 apple_2 观察之一，因为有两个 apple_2 观察。此外，由于第一个 apple_1 在 B 组中，因此不匹配任何梨，因此应删除 B 组的 apple_1，并且应删除 pear_1 观察，因为它没有配对。

这里的问题是观察结果是使用下划线命名的，所以我需要以某种方式操纵 str_detect 并且组需要匹配所以我也需要使用 filter。我觉得这种类型的过滤可以使用 dplyr 完成，但我不确定。

预期结果应该是：我要找的预期结果是这样的：

#  condition   group     type
#1   apple_1       A    small
#2   apple_1       A    small
#3   apple_2       A    small
#4   apple_3       A    small
#5    pear_1       A    small
#6    pear_1       A    small
#7    pear_2       A    small
#8    pear_3       A    small

这样每个具有特定编号的苹果都有一个具有相同编号的匹配梨。

Answer 1

你可以这样做：

vec_drop <- function(x){
  b <- table(x)
  if(length(b)<2) return(FALSE)
  a <- split(!logical(length(x)), x)
  if (length(unique(b))>1)
    a[[names(which.max(b))]][seq(abs(diff(b)))] <- FALSE
  unsplit(a, x)
}


df %>%
  group_by(group, cond = str_remove(condition, "\w+_"))%>%
  filter(vec_drop(condition))


condition group type  cond 
  <chr>     <chr> <chr> <chr>
1 apple_1   A     small 1    
2 apple_1   A     small 1    
3 apple_2   A     small 2    
4 apple_3   A     small 3    
5 pear_1    A     small 1    
6 pear_1    A     small 1    
7 pear_2    A     small 2    
8 pear_3    A     small 3    
>

如何在 R 中使用 filter 和 str_detect 过滤具有部分匹配对的数据？

How do I filter for data that has partially matching pairs using filter and str_detect in R?

r

filter

stringr

dplyr