如何使字符串替换不区分大小写

How to make string replacements case-insensitive

我正在研究音位简化形式的语音转录:

reduced <- c("in n it", "du n no", "dun n it", "wan na", "gon na", "got ta")

我需要用相同字母但没有空格的连续字符串替换这些形式:

reduced_replacements <- setNames(c("innit", "dunno", "dunnit", "wanna", "gonna", "gotta"),            # new forms
                                 c("in n it", "du n no", "dun n it", "wan na", "gon na", "got ta"))   # old forms

问题是 reduced 表格的大小写可能有所不同。也就是说,替换需要不区分大小写。我试图通过包含 (?i):

使正则表达式模式不区分大小写
# pattern:
reduced_pattern <- paste0("(?i)\b(", paste0(reduced, collapse = "|"), ")\b")

但显然这并不能解决问题:

# test:
 tst <- c("Wan na go ? well du n no. come on", "i do n't know really",
          "will be great in n it, ", "it matters Dun n it",
          "Looks awesome. Dun n it?", "Gon na be terrific!")
 library(stringr)
 ifelse(grepl(reduced_pattern, tst, perl = T),
        str_replace_all(tst[grepl(reduced_pattern, tst)], reduced_replacements),
        tst)
[1] "Wan na go ? well dunno. come on" "i do n't know really"            "it matters Dun n it"            
[4] "Looks awesome. Dun n it?"        "Gon na be terrific!"             "Wan na go ? well dunno. come on"

None 大写的 reduced 形式被替换。如何以有效的方式实现这一点,即除了枚举 reducedreduced_replacements 中的大写形式并将所有内容转换为 tolower 大小写外?

正确的结果应该是:

[1] "Wanna go ? well dunno. come on" "i do n't know really"            "it matters Dunnit"            
[4] "Looks awesome. Dunnit?"        "Gonna be terrific!"             "Wanna go ? well dunno. come on"

您可以通过使用 tolower 使 tstreduced_replacements 相同,并在 regex 中使用 ignore_case = TRUE

library(stringr)
str_replace_all(tst, regex(reduced_replacements, ignore_case = TRUE))

#[1] "wanna go ? well dunno. come on" "i do n't know really"           "will be great innit, "         
#[4] "it matters dunnit"              "Looks awesome. dunnit?"         "gonna be terrific!"    

您可以使用带有函数的 stringr::str_replace_all 作为替换参数,您可以在其中简单地删除所需的所有空格。

看到 R demo:

library(stringr)
tst <- c("Wan na go ? well du n no. come on", "i do n't know really",
          "will be great in n it, ", "it matters Dun n it",
          "Looks awesome. Dun n it?", "Gon na be terrific!")
reduced <- c("in n it", "du n no", "dun n it", "wan na", "gon na", "got ta")
reduced_pattern <- paste0("(?i)\b(?:", paste0(reduced, collapse = "|"), ")\b")
str_replace_all(tst, reduced_pattern, function(x) str_replace_all(x, "\s+",""))
## => [1] "Wanna go ? well dunno. come on" "i do n't know really"          
##    [3] "will be great innit, "          "it matters Dunnit"             
##    [5] "Looks awesome. Dunnit?"         "Gonna be terrific!"