Select 第一行每 运行 组
Select first row per run by group
我有包含分组变量 (ID) 和一些值 (type) 的数据:
ID <- c("1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3")
type <- c("1", "3", "3", "2", "3", "3", "1", "1", "1", "2", "2", "1")
dat <- data.frame(ID,type)
在每个ID中,我想删除重复的号码,不是唯一的号码,而是与前一个号码相同的号码。我已经注释了一些例子:
# ID type
# 1 1 1
# 2 1 3 # first value in a run of 3s within ID 1: keep
# 3 1 3 # 2nd value: remove
# 4 1 2
# 5 2 3
# 6 2 3
# 7 2 1
# 8 2 1
# 9 3 1
# 10 3 2 # first value in a run of 2s within ID 3: keep
# 11 3 2 # 2nd value: remove
# 12 3 1
例如ID 3的值序列为1,2,2,1,第三个值与第二个值相同,应将其删除,变为1,2,1
因此,期望的输出是:
data.frame(ID = c("1", "1", "1", "2", "2", "3", "3", "3"),
type = c("1", "3", "2", "3", "1", "1", "2", "1"))
ID type
1 1 1
2 1 3
3 1 2
4 2 3
5 2 1
6 3 1
7 3 2
8 3 1
我试过了
df[!duplicated(df), ]
然而我得到的是
ID <- c("1", "1", "1", "2", "2", "3", "3")
type<- c("1", "3", "2", "3", "1", "1", "2")
我知道复制只会保留唯一的。我怎样才能得到我想要的值?
提前感谢您的帮助!
使用 data.table
rleid
和 duplicated
-
library(data.table)
setDT(dat)[!duplicated(rleid(ID, type))]
# ID type
#1: 1 1
#2: 1 3
#3: 1 2
#4: 2 3
#5: 2 1
#6: 3 1
#7: 3 2
#8: 3 1
改进了答案,包括@Henrik 的建议。
Base R方式 如果只想消除连续的重复行(8行输出)
ID <- c("1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3")
type<- c("1", "3", "3", "2", "3", "3", "1", "1", "1", "2", "2", "1")
dat <- data.frame(ID,type)
subset(dat, !duplicated(with(rle(paste(dat$ID, dat$type)), rep(seq_len(length(lengths)), lengths))))
#> ID type
#> 1 1 1
#> 2 1 3
#> 4 1 2
#> 5 2 3
#> 7 2 1
#> 9 3 1
#> 10 3 2
#> 12 3 1
由 reprex package (v2.0.0)
创建于 2021-05-22
这个有用吗:
library(dplyr)
dat %>% group_by(ID) %>%
mutate(flag = case_when(type == lag(type) ~ TRUE, TRUE ~ FALSE)) %>%
filter(!flag) %>% select(-flag)
# A tibble: 8 x 2
# Groups: ID [3]
ID type
<chr> <chr>
1 1 1
2 1 3
3 1 2
4 2 3
5 2 1
6 3 1
7 3 2
8 3 1
我有包含分组变量 (ID) 和一些值 (type) 的数据:
ID <- c("1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3")
type <- c("1", "3", "3", "2", "3", "3", "1", "1", "1", "2", "2", "1")
dat <- data.frame(ID,type)
在每个ID中,我想删除重复的号码,不是唯一的号码,而是与前一个号码相同的号码。我已经注释了一些例子:
# ID type
# 1 1 1
# 2 1 3 # first value in a run of 3s within ID 1: keep
# 3 1 3 # 2nd value: remove
# 4 1 2
# 5 2 3
# 6 2 3
# 7 2 1
# 8 2 1
# 9 3 1
# 10 3 2 # first value in a run of 2s within ID 3: keep
# 11 3 2 # 2nd value: remove
# 12 3 1
例如ID 3的值序列为1,2,2,1,第三个值与第二个值相同,应将其删除,变为1,2,1
因此,期望的输出是:
data.frame(ID = c("1", "1", "1", "2", "2", "3", "3", "3"),
type = c("1", "3", "2", "3", "1", "1", "2", "1"))
ID type
1 1 1
2 1 3
3 1 2
4 2 3
5 2 1
6 3 1
7 3 2
8 3 1
我试过了
df[!duplicated(df), ]
然而我得到的是
ID <- c("1", "1", "1", "2", "2", "3", "3")
type<- c("1", "3", "2", "3", "1", "1", "2")
我知道复制只会保留唯一的。我怎样才能得到我想要的值?
提前感谢您的帮助!
使用 data.table
rleid
和 duplicated
-
library(data.table)
setDT(dat)[!duplicated(rleid(ID, type))]
# ID type
#1: 1 1
#2: 1 3
#3: 1 2
#4: 2 3
#5: 2 1
#6: 3 1
#7: 3 2
#8: 3 1
改进了答案,包括@Henrik 的建议。
Base R方式 如果只想消除连续的重复行(8行输出)
ID <- c("1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3")
type<- c("1", "3", "3", "2", "3", "3", "1", "1", "1", "2", "2", "1")
dat <- data.frame(ID,type)
subset(dat, !duplicated(with(rle(paste(dat$ID, dat$type)), rep(seq_len(length(lengths)), lengths))))
#> ID type
#> 1 1 1
#> 2 1 3
#> 4 1 2
#> 5 2 3
#> 7 2 1
#> 9 3 1
#> 10 3 2
#> 12 3 1
由 reprex package (v2.0.0)
创建于 2021-05-22这个有用吗:
library(dplyr)
dat %>% group_by(ID) %>%
mutate(flag = case_when(type == lag(type) ~ TRUE, TRUE ~ FALSE)) %>%
filter(!flag) %>% select(-flag)
# A tibble: 8 x 2
# Groups: ID [3]
ID type
<chr> <chr>
1 1 1
2 1 3
3 1 2
4 2 3
5 2 1
6 3 1
7 3 2
8 3 1