Stringr str_replace_all 遗漏了重复的术语

Stringr str_replace_all misses repeated terms

我在使用 stringr::str_replace_all 函数时遇到问题。我正在尝试用 insuredvehicle 替换 iv 的所有实例,但该函数似乎只捕捉到第一个词。

temp_data <- data.table(text = 'the driver of the 1st vehicle hit the iv iv at a stop')
temp_data[, new_text := stringr::str_replace_all(pattern = ' iv ', replacement = ' insuredvehicle ', string = text)]

结果如下所示,错过了第 2 个 iv 项:

1: the driver of the 1st vehicle hit the insuredvehicle iv at a stop

我认为问题在于这 2 个实例共享一个 space,这是搜索模式的一部分。我这样做是因为我想替换 iv 术语,而不是 driver.[= 中的 iv 14=]

我不想简单地将重复项合并为 1。我希望结果如下所示:

1: the driver of the 1st vehicle hit the insuredvehicle insuredvehicle at a stop

如果能帮我实现这个功能,我将不胜感激!

也许如果您在正则表达式中包含单词边界,而不是从替换中删除白色 spaces?当您只需要一个与模式匹配的完整单词而不是单词的一部分时,它是理想的选择,同时远离这些空白 space 问题。 \b似乎可以解决问题

temp_data[, new_text := stringr::str_replace_all(pattern = '\biv\b', replacement = 'insuredvehicle', string = text)]

new_text

1: the driver of the 1st vehicle hit the insuredvehicle insuredvehicle at a stop

您可以使用环视:

temp_data[, new_text := stringr::str_replace_all(pattern = '(?<= )iv(?= )', replacement = 'insuredvehicle', string = text)]

输出:

"the driver of the 1st vehicle hit the insuredvehicle insuredvehicle at a stop"

使用gsub:

gsub("\biv\b", "insuredvehicle", temp_data$text)
[1] "the driver of the 1st vehicle hit the uninsuredvehicle uninsuredvehicle at a stop"

使用space边界:

temp_data <- data.table(text = 'the driver of the 1st vehicle hit the iv iv at a stop')
temp_data[, new_text := stringr::str_replace_all(pattern = '(?<!\S)iv(?!\S)', replacement = 'insuredvehicle', string = text)]

参见regex proof

解释

--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
    \S                       non-whitespace (all but \n, \r, \t, \f,
                             and " ")
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  iv                       'iv'
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    \S                       non-whitespace (all but \n, \r, \t, \f,
                             and " ")
--------------------------------------------------------------------------------
  )                        end of look-ahead