重命名具有相同名称的字符串

Rename strings with identical names

上下文

从电子表格软件导入具有相同名称的列时,readxl 使用以下语法转换双子:"Col1","Col1" 变为:"Col1","Col1...2"。我想改为将其转换为 "Col1","Col1A".

这是一个可重现的例子:

示例

# Original string : 
library(stringr)
string <- c("G01","G01...2","G02","G03","G04","G04...6","G05","G05...8")
# Desired result
result <- c("G01","G01A","G02","G03","G04","G04A","G05","G05A")
# this line successfully detects the wrongful entries : 


str_detect(string,pattern = "[:alpha:][:digit:][:digit:]...[:digit:]")
 # this line fails to address the issue correctly : 
 str_replace(string,"[:alpha:][:digit:][:digit:]...[:digit:]", "[:alpha:][:digit:][:digit:]A")
    #output : 
    [1] "G01"                          "[:alpha:][:digit:][:digit:]A" "G02"                         
    [4] "G03"                          "G04"                          "[:alpha:][:digit:][:digit:]A"
    [7] "G05"                          "[:alpha:][:digit:][:digit:]A"

我们可以使用 str_remove 删除以一个或多个 . 开头后跟任何其他字符的子字符串,然后使用 make.unique 通过附加 [=14] 来更改重复项=]、.2

library(stringr)
make.unique(str_remove(string, "\.+.*"))

如果我们需要添加 LETTERS,问题是只有 26 个重复项可以填充

假设不会超过 26 个重复,你可以这样做

nm = sapply(strsplit(string, "\.{3}"), function(x) x[1])
paste0(nm, ave(nm, nm, FUN = function(x) c("", LETTERS)[seq_along(x)]))
# [1] "G01"  "G01A" "G02"  "G03"  "G04"  "G04A" "G05"  "G05A"