在 R 中,有条件地删除 ID、日期和事件中的重复行

In R, conditionally remove duplicate rows within ID, Date, and Event

背景

我有 d,一个数据框:

d <- data.frame(ID = c("a","a","a","a", "b","b"),
                event = c("G12","G12","O99","O99","B4","B2"),
                date = as.Date(c("2011-01-01","2011-01-01","2011-12-23","2011-12-23","2011-01-01","2011-07-12")),
                stringsAsFactors=FALSE)

如您所见,ID a 有 4 行,其中 2 行是基于 eventdate 的重复(第 2 行和第 4 行是重复项)。

问题和期望的输出

我想通过要求 R 删除 ID 中具有相同 eventdate 的行来删除这些重复行。换句话说,我想要这样的东西:

d <- data.frame(ID = c("a","a", "b","b"),
                event = c("G12","O99","B4","B2"),
                date = as.Date(c("2011-01-01","2011-12-23", "2011-01-01","2011-07-12")),
                stringsAsFactors=FALSE) 

我试过的

我已经尝试过了,但还不够:

d2 <- subset(d, duplicated(d$ID, d$event))

有什么想法吗?

一种选择是使用 unique

unique(d)
#>   ID event       date
#> 1  a   G12 2011-01-01
#> 3  a   O99 2011-12-23
#> 5  b    B4 2011-01-01
#> 6  b    B2 2011-07-12

使用data.table

library(data.table)

dt <- data.table(d)

unique(dt[, .(event, date), by = ID])

reprex package (v2.0.1)

于 2021-11-23 创建