R中的正则表达式提取

Question

我正在开展一个项目，该项目使用多维尺度来尝试根据投票记录将政客分组。我的适合度很高；但是，我想用政客的名字绘制 MDS 坐标，这样我就可以从计算中得出结论。为此，我正在使用 wordcloud 库。

我试图在 R 中使用正则表达式，使用 stringr 包从我的 "names" 向量中提取政客的名字，名字向量包含一些非标准字符。我的目标是提取姓氏和方括号中的字符。名称有 3 种不同的外观，如下所示：

森。迈克·李 [R]
森。克里斯·库恩斯 [D, 2010-2020]
森。查尔斯“查克”格拉斯利 [R]

来自stringr包我是运行这个代码：

str_extract("\w+\s\[.+\]$", names)  # names is the vector of names

我收到这个错误：

Error in UseMethod("type") : 
  no applicable method for 'type' applied to an object of class "NULL"

我正在尝试诊断此错误，但似乎无法在网上找到任何帮助。

Answer 1

给定

names <- c("Sen. Mike Lee [R]", "Sen. Chris Coons [D, 2010-2020]", "Sen. Charles â€œChuckâ€ Grassley [R]")
stringr::str_extract("\w+\s\[.+\]$", names)  # names is the vector of names
# [1] NA NA NA

和

t(sapply(regmatches(names, regexec(".*\s(\w+)\s\[(.+)\]", names)), "[", -1))
#      [,1]       [,2]          
# [1,] "Lee"      "R"           
# [2,] "Coons"    "D, 2010-2020"
# [3,] "Grassley" "R"

我无法重现你的错误。

Answer 2

你混淆了str_replace中的参数顺序：它必须是str_extract(names, "\w+\s\[.+\]$")（也就是说，names应该是第一个参数，正则表达式必须是第二个参数。你将获得

> str_extract(names, "\w+\s\[.+\]$")
[1] "Lee [R]"              "Coons [D, 2010-2020]" "Grassley [R]"

请注意，您可以从 ] 中删除转义符号，因为它不是特殊的正则表达式元字符，您可以将 .+ 替换为否定字符 class [^][]+匹配 ] 和 [:

以外的任何一个或多个字符

> str_extract(names, "\w+\s\[[^\]\[]+]$")
[1] "Lee [R]"              "Coons [D, 2010-2020]" "Grassley [R]"

R中的正则表达式提取

Regex extraction in R

regex

r

stringr