根据唯一 ID 过滤非重复值

Filter for non-repeated values based on Unique ID

我有以下table

S/N Unique ID Code
1 111 YES
2 111 YES
3 111 NO
4 111 YES
5 222 YES
6 222 YES
7 222 YES
8 222 YES
9 333 NO
10 333 NO
11 333 YES
12 333 YES

如何根据以下条件得出以下 table: 对于每个唯一 ID,如果 YES 重复出现,则保留第一个 YES。如果出现NO,则保留下面的YES。我尝试使用 mutate,它给了我各种各样的错误。

S/N Unique ID Code
1 111 YES
4 111 YES
5 222 YES
11 333 YES

谢谢!

基础 R

ind <- ave(dat$Code == "YES", dat$`Unique ID`,
           FUN = function(z) z & c(TRUE, !z[-length(z)]))
dat[ind,]
#    S/N Unique ID Code
# 1    1       111  YES
# 4    4       111  YES
# 5    5       222  YES
# 11  11       333  YES

dplyr

library(dplyr)
dat %>%
  group_by(`Unique ID`) %>%
  filter(Code == "YES" & lag(Code == "NO", default = TRUE)) %>%
  ungroup()
# # A tibble: 4 x 3
#   `S/N` `Unique ID` Code 
#   <int>       <int> <chr>
# 1     1         111 YES  
# 2     4         111 YES  
# 3     5         222 YES  
# 4    11         333 YES  

data.table

library(data.table)
as.data.table(dat)[, .SD[Code == "YES" & shift(Code == "NO", fill = TRUE),], by = `Unique ID`]
#    Unique ID   S/N   Code
#        <int> <int> <char>
# 1:       111     1    YES
# 2:       111     4    YES
# 3:       222     5    YES
# 4:       333    11    YES

数据

dat <- structure(list("S/N" = 1:12, "Unique ID" = c(111L, 111L, 111L, 111L, 222L, 222L, 222L, 222L, 333L, 333L, 333L, 333L), Code = c("YES", "YES", "NO", "YES", "YES", "YES", "YES", "YES", "NO", "NO", "YES", "YES")), class = "data.frame", row.names = c(NA, -12L))