R stringr::string_replace_all() 与命名向量问题

Question

我在标准化合并地址时遇到问题。 USPS 有一大堆标准缩写，我将其转换为命名向量。当我运行 str_replace_all() 如图所示时，它并没有像我希望的那样替换 'AVENUE' 变体。我相信它会抓住第一场比赛而不是最长或精确的比赛，但我想不出一种优雅的方法来解决这个问题。感谢您的任何建议。

library(tidyverse)

addresses = c("10580 BAR AVE", "1234 WILL AVENUE")

standard_abbreviations <- c('AV' = 'AVE', 'AVE' = 'AVE', 'AVENU' = 'AVE', 'AVENUE' = 'AVE', 'AVN' = 'AVE')

addresses_standardized <- str_replace_all(addresses, standard_abbreviations)

结果不正确，大道变体拼写错误：

> addresses_standardized
[1] "10580 BAR AVEE"    "1234 WILL AVEENUE"

Answer 1

您重复应用您的替代品，试试：

library(tidyverse)

addresses = c("10580 BAR AVE", "1234 WILL AVENUE")

standard_abbreviations <- c('AV\b' = 'AVE', 'AVE' = 'AVE', 'AVENU\b' = 'AVE', 'AVENUE' = 'AVE', 'AVN\b' = 'AVE')

addresses_standardized <- str_replace_all(addresses, standard_abbreviations)
addresses_standardized
#> [1] "10580 BAR AVE" "1234 WILL AVE"

\b 用于 \b，它是单词结尾的正则表达式。

https://www.regular-expressions.info/wordboundaries.html

R stringr::string_replace_all() 与命名向量问题

R stringr::string_replace_all() with named vector problem

regex

r

str-replace

stringr

tidyverse