用于从模式中提取名称的正则表达式

Regex for pulling name out of pattern

我希望从以下模式中提取全名。一些名称有连字符或多个大写字母,如给出的示例:

(括号内的所有数字均为 1 位或 2 位)。括号前的所有大写城市缩写都是 2 或 3 个字符)

Davante Adams LV (6)
Christian McCaffrey CAR (10)
J.K. Dobbins BAL (5)
Amon-Ra St. Brown DET (7)
AJ Brown PHI (11)
Michael Pittman Jr. IND (14)
JuJu Smith-Schuster PIT (9)

结果应该是...

Davante Adams
Christian McCaffrey
J.K. Dobbins
Amon-Ra St. Brown
AJ Brown
Michael Pittman Jr.
JuJu Smith-Schuster

我们可以将 trimws 与正则表达式一起使用 whitespace 即一个或多个 space (\s+) 后跟一个或多个大写字母 ([A-Z]+ ),然后是任何 space 和括号内的一位或多位数字 (\d+)

trimws(str1, whitespace = "\s+[A-Z]+\s*\(\d+\)")

-输出

[1] "Davante Adams"      
[2] "Christian McCaffrey" 
[3] "J.K. Dobbins"     
[4] "Amon-Ra St. Brown" 
[5]  "AJ Brown"           
[6] "Michael Pittman Jr." 
[7] "JuJu Smith-Schuster"

数据

str1 <- c("Davante Adams LV (6)", "Christian McCaffrey CAR (10)", "J.K. Dobbins BAL (5)", 
"Amon-Ra St. Brown DET (7)", "AJ Brown PHI (11)", "Michael Pittman Jr. IND (14)", 
"JuJu Smith-Schuster PIT (9)")

使用strsplit

strsplit(str, " [A-Z]+ \(\d+\) *")[[1]]
#> [1] "Davante Adams"       "Christian McCaffrey" "J.K. Dobbins"       
#> [4] "Amon-Ra St. Brown"   "AJ Brown"            "Michael Pittman Jr."
#> [7] "JuJu Smith-Schuster"
``