在 R 中,有条件地删除 ID、日期和事件中的重复行
In R, conditionally remove duplicate rows within ID, Date, and Event
背景
我有 d
,一个数据框:
d <- data.frame(ID = c("a","a","a","a", "b","b"),
event = c("G12","G12","O99","O99","B4","B2"),
date = as.Date(c("2011-01-01","2011-01-01","2011-12-23","2011-12-23","2011-01-01","2011-07-12")),
stringsAsFactors=FALSE)
如您所见,ID a
有 4 行,其中 2 行是基于 event
和 date
的重复(第 2 行和第 4 行是重复项)。
问题和期望的输出
我想通过要求 R
删除 ID
中具有相同 event
和 date
的行来删除这些重复行。换句话说,我想要这样的东西:
d <- data.frame(ID = c("a","a", "b","b"),
event = c("G12","O99","B4","B2"),
date = as.Date(c("2011-01-01","2011-12-23", "2011-01-01","2011-07-12")),
stringsAsFactors=FALSE)
我试过的
我已经尝试过了,但还不够:
d2 <- subset(d, duplicated(d$ID, d$event))
有什么想法吗?
一种选择是使用 unique
unique(d)
#> ID event date
#> 1 a G12 2011-01-01
#> 3 a O99 2011-12-23
#> 5 b B4 2011-01-01
#> 6 b B2 2011-07-12
library(data.table)
dt <- data.table(d)
unique(dt[, .(event, date), by = ID])
由 reprex package (v2.0.1)
于 2021-11-23 创建
背景
我有 d
,一个数据框:
d <- data.frame(ID = c("a","a","a","a", "b","b"),
event = c("G12","G12","O99","O99","B4","B2"),
date = as.Date(c("2011-01-01","2011-01-01","2011-12-23","2011-12-23","2011-01-01","2011-07-12")),
stringsAsFactors=FALSE)
如您所见,ID a
有 4 行,其中 2 行是基于 event
和 date
的重复(第 2 行和第 4 行是重复项)。
问题和期望的输出
我想通过要求 R
删除 ID
中具有相同 event
和 date
的行来删除这些重复行。换句话说,我想要这样的东西:
d <- data.frame(ID = c("a","a", "b","b"),
event = c("G12","O99","B4","B2"),
date = as.Date(c("2011-01-01","2011-12-23", "2011-01-01","2011-07-12")),
stringsAsFactors=FALSE)
我试过的
我已经尝试过了,但还不够:
d2 <- subset(d, duplicated(d$ID, d$event))
有什么想法吗?
一种选择是使用 unique
unique(d)
#> ID event date
#> 1 a G12 2011-01-01
#> 3 a O99 2011-12-23
#> 5 b B4 2011-01-01
#> 6 b B2 2011-07-12
library(data.table)
dt <- data.table(d)
unique(dt[, .(event, date), by = ID])
由 reprex package (v2.0.1)
于 2021-11-23 创建