正则表达式捕获包含 R 中标点符号的子字符串
Regex to capture a substring containing punctuation in R
我有一个字符串列表,其中每个元素都包含带和不带标点符号的大写名称,后跟一个句子。
names_list = list("MICKEY MOUSE is a Disney character",
"DAFFY DUCK is a Warner Bros. character",
"GARFIELD, ODI AND JOHN are characters from a USA cartoon comic strip.",
"BUGS-BUNNY AND FRIENDS Warner Bros. owns these characters.")
我只想提取每个字符串开头的大写名称。我达到了:
library('stringr')
str_extract(names_list, '([:upper:]+([:punct:]?[:upper:]?)[:space:])+')
[1] "MICKEY MOUSE " "DAFFY DUCK " "GARFIELD, ODI AND JOHN " "BUNNY AND FRIENDS "
我不知道如何指定“BUGS-BUNNY”中的中间单词标点符号,以便提取整个单词。帮助非常感谢!
您可以尝试捕获多次出现的大写字母以及标点符号和 space,直到遇到 space 和任何 upper/lower 大小写字母。
library(stringr)
str_extract(names_list, '([[:upper:][:punct:][:space:]])+(?=\s[A-Za-z])')
#[1] "MICKEY MOUSE" "DAFFY DUCK" "GARFIELD, ODI AND JOHN"
# "BUGS-BUNNY AND FRIENDS"
我们可以使用 sub
来自 base R
sub("^([A-Z, -]+)\s+.*", "\1", unlist(names_list))
#[1] "MICKEY MOUSE" "DAFFY DUCK" "GARFIELD, ODI AND JOHN" "BUGS-BUNNY AND FRIENDS"
我有一个字符串列表,其中每个元素都包含带和不带标点符号的大写名称,后跟一个句子。
names_list = list("MICKEY MOUSE is a Disney character",
"DAFFY DUCK is a Warner Bros. character",
"GARFIELD, ODI AND JOHN are characters from a USA cartoon comic strip.",
"BUGS-BUNNY AND FRIENDS Warner Bros. owns these characters.")
我只想提取每个字符串开头的大写名称。我达到了:
library('stringr')
str_extract(names_list, '([:upper:]+([:punct:]?[:upper:]?)[:space:])+')
[1] "MICKEY MOUSE " "DAFFY DUCK " "GARFIELD, ODI AND JOHN " "BUNNY AND FRIENDS "
我不知道如何指定“BUGS-BUNNY”中的中间单词标点符号,以便提取整个单词。帮助非常感谢!
您可以尝试捕获多次出现的大写字母以及标点符号和 space,直到遇到 space 和任何 upper/lower 大小写字母。
library(stringr)
str_extract(names_list, '([[:upper:][:punct:][:space:]])+(?=\s[A-Za-z])')
#[1] "MICKEY MOUSE" "DAFFY DUCK" "GARFIELD, ODI AND JOHN"
# "BUGS-BUNNY AND FRIENDS"
我们可以使用 sub
来自 base R
sub("^([A-Z, -]+)\s+.*", "\1", unlist(names_list))
#[1] "MICKEY MOUSE" "DAFFY DUCK" "GARFIELD, ODI AND JOHN" "BUGS-BUNNY AND FRIENDS"