替换字符串中的重复字符

replace duplicate characters from strings

我正在尝试从字符串中删除重复字符。

dput(test)
c("APAAAAAAAAAAAPAAPPAPAPAAAAAAAAAAAAAAAAAAAAAAAAPPAPAAAAAAPPAPAAAPAPAAAAP", 
"AAA", "P", "P", "A", "P", "P", "APPPPPA", "A", "P", "AA", "PP", 
"PPA", "P", "P", "A", "P", "APAP", "P", "PA")

我创建了一个函数来对字符串进行排序

strSort <- function(x)
  sapply(lapply(strsplit(x, NULL), sort), paste, collapse="")

然后我用gsub删除连续的字符

gsub("(.)\1{2,}", "\1", str_Sort(test))

这个输出为

gsub("(.)\1{2,}", "\1", strSort(test))
 [1] "AP"   "A"    "P"    "P"    "A"    "P"    "P"    "AAP"  "A"    "P"    "AA"   "PP"   "APP"  "P"    "P"    "A"    "P"    "AAPP" "P"    "AP"

输出应该只有一个A and/or一个P.

strsplit 输出中,我们需要在 sorted 元素上使用 unique

sapply(strsplit(test, ""), function(x) 
       paste(unique(sort(x)), collapse=""))
#[1] "AP" "A"  "P"  "P"  "A"  "P"  "P"  "AP" "A"  "P"  "A"  "P"  "AP" "P"  "P"  "A"  "P"  "AP" "P"  "AP"

这是另一个使用 utf8ToInt + intToUtf8

的选项
> sapply(test, function(x) intToUtf8(sort(unique(utf8ToInt(x)))), USE.NAMES = FALSE)
 [1] "AP" "A"  "P"  "P"  "A"  "P"  "P"  "AP" "A"  "P"  "A"  "P"  "AP" "P"  "P" 
[16] "A"  "P"  "AP" "P"  "AP"

使用正则表达式你可以做到:

gsub('(?:(.)(?=(.*)\1))', '', test, perl = TRUE)

#[1] "AP" "A"  "P"  "P"  "A"  "P"  "P"  "PA" "A"  "P"  "A"  "P"  "PA"
#[14] "P"  "P"  "A"  "P"  "AP" "P"  "PA"

正则表达式取自