字符串替换：如何处理相似的字符串和空格

Question

上下文： 使用包含相应替换的 table 将 table 从法语翻译成英语。

问题：字符串有时很相似，当涉及到白色space时str_replace()没有考虑整个字符串。

可重现的例子:

library(stringr)  #needed for the str_replace_all() function

#datasets

# test is the table indicating corresponding strings
test = data.frame(fr = as.character(c("Autre", "Autres", "Autre encore")),
                  en = as.character(c("Other", "Others", "Other again")),
                  stringsAsFactors = FALSE)
# test1 is the table I want to translate
test1 = data.frame(totrans = as.character(c("Autre", "Autres", "Autre encore")),
                   stringsAsFactors = FALSE)

# here is a function to translate
test2 = str_replace_all(test1$totrans, setNames(test$en, test$fr))

输出：

我明白了

> test2
[1] "Other"        "Others"       "Other encore"

预期结果：

> testexpected
[1] "Other"       "Others"      "Other again"

如您所见，如果字符串开头相同但没有白色space，则替换成功（请参阅其他和其他）但是当有白色space时，它会失败（"Autre encore" 替换为 "Other encore" 而不是 "Other again"）。

我觉得答案很明显，但我就是找不到解决方法...欢迎任何建议。

Answer 1

我认为您只需要在查找周围使用单词边界（即“\\b”）。通过 str_replace_all.

中的 paste0 调用添加它们很简单

请注意，您不需要为此包含整个 tidyverse； str_replace_all 函数是 stringr 包的一部分，它只是调用 library(tidyverse):

时加载的几个包之一

library(stringr) 

test = data.frame(fr = as.character(c("Autre", "Autres", "Autre encore")),
                  en = as.character(c("Other", "Others", "Other again")),
                  stringsAsFactors = FALSE)

test1 = data.frame(totrans = as.character(c("Autre", "Autres", "Autre encore")),
                   stringsAsFactors = FALSE)

str_replace_all(test1$totrans, paste0("\b", test$fr, "\b"), test$en)
#> [1] "Other"       "Others"      "Other again"

^{由 reprex package (v0.3.0)}

于 2020-05-14 创建

字符串替换：如何处理相似的字符串和空格

String replacements: how to deal with similar strings and spaces

r

str-replace