根据下一行删除重复项

Question

我是 R 的新手。我想删除数据框中的重复行，其中 df$x = "string" AND the next row = the same string

所以说我有这个专栏

1. String - remove 2. String 3. A 4. A 5. A 6. String - remove 7. String - remove 8. String 9. A 10. A

我想要的结果是

2. String 3. A 4. A 5. A 8. String 9. A 10. A

Answer 1

我们可以使用 dplyr 中的 lead 并删除当前行和下一行为 "String" 的行。

library(dplyr)

df %>%
  filter(!(V1 == "String" & lead(V1) == "String"))

#      V1
#1 String
#2      A
#3      A
#4 String
#5      A

使用 base R，我们可以做到

df[!((df$V1 == "String") & c(df$V1[-1], NA) == "String"),,drop = FALSE]

#      V1
#2 String
#3      A
#4      A
#7 String
#8      A

数据

df <- structure(list(V1 = c("String", "String", "A", "A", "String", 
"String", "String", "A")), .Names = "V1", row.names = c(NA, -8L
 ), class = "data.frame")

Answer 2

我们可以使用 duplicated 和 rleid 创建一个逻辑索引来对行进行子集化

library(data.table)
setDT(df)[!(duplicated(rleid(V1)) & V1 == 'String')]
#       V1
#1: String
#2:      A
#3:      A
#4: String
#5:      A

数据

df <- structure(list(V1 = c("String", "String", "A", "A", "String", 
"String", "String", "A")), row.names = c(NA, -8L), class = "data.frame")

根据下一行删除重复项

Remove duplicates based on next row

r

duplicates

shift

数据