重命名具有相同名称的字符串
Rename strings with identical names
上下文
从电子表格软件导入具有相同名称的列时,readxl 使用以下语法转换双子:"Col1","Col1" 变为:"Col1","Col1...2"。我想改为将其转换为 "Col1","Col1A".
这是一个可重现的例子:
示例
# Original string :
library(stringr)
string <- c("G01","G01...2","G02","G03","G04","G04...6","G05","G05...8")
# Desired result
result <- c("G01","G01A","G02","G03","G04","G04A","G05","G05A")
# this line successfully detects the wrongful entries :
str_detect(string,pattern = "[:alpha:][:digit:][:digit:]...[:digit:]")
# this line fails to address the issue correctly :
str_replace(string,"[:alpha:][:digit:][:digit:]...[:digit:]", "[:alpha:][:digit:][:digit:]A")
#output :
[1] "G01" "[:alpha:][:digit:][:digit:]A" "G02"
[4] "G03" "G04" "[:alpha:][:digit:][:digit:]A"
[7] "G05" "[:alpha:][:digit:][:digit:]A"
我们可以使用 str_remove
删除以一个或多个 .
开头后跟任何其他字符的子字符串,然后使用 make.unique
通过附加 [=14] 来更改重复项=]、.2
等
library(stringr)
make.unique(str_remove(string, "\.+.*"))
如果我们需要添加 LETTERS
,问题是只有 26 个重复项可以填充
假设不会超过 26 个重复,你可以这样做
nm = sapply(strsplit(string, "\.{3}"), function(x) x[1])
paste0(nm, ave(nm, nm, FUN = function(x) c("", LETTERS)[seq_along(x)]))
# [1] "G01" "G01A" "G02" "G03" "G04" "G04A" "G05" "G05A"
上下文
从电子表格软件导入具有相同名称的列时,readxl 使用以下语法转换双子:"Col1","Col1" 变为:"Col1","Col1...2"。我想改为将其转换为 "Col1","Col1A".
这是一个可重现的例子:
示例
# Original string :
library(stringr)
string <- c("G01","G01...2","G02","G03","G04","G04...6","G05","G05...8")
# Desired result
result <- c("G01","G01A","G02","G03","G04","G04A","G05","G05A")
# this line successfully detects the wrongful entries :
str_detect(string,pattern = "[:alpha:][:digit:][:digit:]...[:digit:]")
# this line fails to address the issue correctly :
str_replace(string,"[:alpha:][:digit:][:digit:]...[:digit:]", "[:alpha:][:digit:][:digit:]A")
#output :
[1] "G01" "[:alpha:][:digit:][:digit:]A" "G02"
[4] "G03" "G04" "[:alpha:][:digit:][:digit:]A"
[7] "G05" "[:alpha:][:digit:][:digit:]A"
我们可以使用 str_remove
删除以一个或多个 .
开头后跟任何其他字符的子字符串,然后使用 make.unique
通过附加 [=14] 来更改重复项=]、.2
等
library(stringr)
make.unique(str_remove(string, "\.+.*"))
如果我们需要添加 LETTERS
,问题是只有 26 个重复项可以填充
假设不会超过 26 个重复,你可以这样做
nm = sapply(strsplit(string, "\.{3}"), function(x) x[1])
paste0(nm, ave(nm, nm, FUN = function(x) c("", LETTERS)[seq_along(x)]))
# [1] "G01" "G01A" "G02" "G03" "G04" "G04A" "G05" "G05A"