如何使字符串替换不区分大小写
How to make string replacements case-insensitive
我正在研究音位简化形式的语音转录:
reduced <- c("in n it", "du n no", "dun n it", "wan na", "gon na", "got ta")
我需要用相同字母但没有空格的连续字符串替换这些形式:
reduced_replacements <- setNames(c("innit", "dunno", "dunnit", "wanna", "gonna", "gotta"), # new forms
c("in n it", "du n no", "dun n it", "wan na", "gon na", "got ta")) # old forms
问题是 reduced
表格的大小写可能有所不同。也就是说,替换需要不区分大小写。我试图通过包含 (?i)
:
使正则表达式模式不区分大小写
# pattern:
reduced_pattern <- paste0("(?i)\b(", paste0(reduced, collapse = "|"), ")\b")
但显然这并不能解决问题:
# test:
tst <- c("Wan na go ? well du n no. come on", "i do n't know really",
"will be great in n it, ", "it matters Dun n it",
"Looks awesome. Dun n it?", "Gon na be terrific!")
library(stringr)
ifelse(grepl(reduced_pattern, tst, perl = T),
str_replace_all(tst[grepl(reduced_pattern, tst)], reduced_replacements),
tst)
[1] "Wan na go ? well dunno. come on" "i do n't know really" "it matters Dun n it"
[4] "Looks awesome. Dun n it?" "Gon na be terrific!" "Wan na go ? well dunno. come on"
None 大写的 reduced
形式被替换。如何以有效的方式实现这一点,即除了枚举 reduced
和 reduced_replacements
中的大写形式并将所有内容转换为 tolower
大小写外?
正确的结果应该是:
[1] "Wanna go ? well dunno. come on" "i do n't know really" "it matters Dunnit"
[4] "Looks awesome. Dunnit?" "Gonna be terrific!" "Wanna go ? well dunno. come on"
您可以通过使用 tolower
使 tst
和 reduced_replacements
相同,并在 regex
中使用 ignore_case = TRUE
。
library(stringr)
str_replace_all(tst, regex(reduced_replacements, ignore_case = TRUE))
#[1] "wanna go ? well dunno. come on" "i do n't know really" "will be great innit, "
#[4] "it matters dunnit" "Looks awesome. dunnit?" "gonna be terrific!"
您可以使用带有函数的 stringr::str_replace_all
作为替换参数,您可以在其中简单地删除所需的所有空格。
看到 R demo:
library(stringr)
tst <- c("Wan na go ? well du n no. come on", "i do n't know really",
"will be great in n it, ", "it matters Dun n it",
"Looks awesome. Dun n it?", "Gon na be terrific!")
reduced <- c("in n it", "du n no", "dun n it", "wan na", "gon na", "got ta")
reduced_pattern <- paste0("(?i)\b(?:", paste0(reduced, collapse = "|"), ")\b")
str_replace_all(tst, reduced_pattern, function(x) str_replace_all(x, "\s+",""))
## => [1] "Wanna go ? well dunno. come on" "i do n't know really"
## [3] "will be great innit, " "it matters Dunnit"
## [5] "Looks awesome. Dunnit?" "Gonna be terrific!"
我正在研究音位简化形式的语音转录:
reduced <- c("in n it", "du n no", "dun n it", "wan na", "gon na", "got ta")
我需要用相同字母但没有空格的连续字符串替换这些形式:
reduced_replacements <- setNames(c("innit", "dunno", "dunnit", "wanna", "gonna", "gotta"), # new forms
c("in n it", "du n no", "dun n it", "wan na", "gon na", "got ta")) # old forms
问题是 reduced
表格的大小写可能有所不同。也就是说,替换需要不区分大小写。我试图通过包含 (?i)
:
# pattern:
reduced_pattern <- paste0("(?i)\b(", paste0(reduced, collapse = "|"), ")\b")
但显然这并不能解决问题:
# test:
tst <- c("Wan na go ? well du n no. come on", "i do n't know really",
"will be great in n it, ", "it matters Dun n it",
"Looks awesome. Dun n it?", "Gon na be terrific!")
library(stringr)
ifelse(grepl(reduced_pattern, tst, perl = T),
str_replace_all(tst[grepl(reduced_pattern, tst)], reduced_replacements),
tst)
[1] "Wan na go ? well dunno. come on" "i do n't know really" "it matters Dun n it"
[4] "Looks awesome. Dun n it?" "Gon na be terrific!" "Wan na go ? well dunno. come on"
None 大写的 reduced
形式被替换。如何以有效的方式实现这一点,即除了枚举 reduced
和 reduced_replacements
中的大写形式并将所有内容转换为 tolower
大小写外?
正确的结果应该是:
[1] "Wanna go ? well dunno. come on" "i do n't know really" "it matters Dunnit"
[4] "Looks awesome. Dunnit?" "Gonna be terrific!" "Wanna go ? well dunno. come on"
您可以通过使用 tolower
使 tst
和 reduced_replacements
相同,并在 regex
中使用 ignore_case = TRUE
。
library(stringr)
str_replace_all(tst, regex(reduced_replacements, ignore_case = TRUE))
#[1] "wanna go ? well dunno. come on" "i do n't know really" "will be great innit, "
#[4] "it matters dunnit" "Looks awesome. dunnit?" "gonna be terrific!"
您可以使用带有函数的 stringr::str_replace_all
作为替换参数,您可以在其中简单地删除所需的所有空格。
看到 R demo:
library(stringr)
tst <- c("Wan na go ? well du n no. come on", "i do n't know really",
"will be great in n it, ", "it matters Dun n it",
"Looks awesome. Dun n it?", "Gon na be terrific!")
reduced <- c("in n it", "du n no", "dun n it", "wan na", "gon na", "got ta")
reduced_pattern <- paste0("(?i)\b(?:", paste0(reduced, collapse = "|"), ")\b")
str_replace_all(tst, reduced_pattern, function(x) str_replace_all(x, "\s+",""))
## => [1] "Wanna go ? well dunno. come on" "i do n't know really"
## [3] "will be great innit, " "it matters Dunnit"
## [5] "Looks awesome. Dunnit?" "Gonna be terrific!"