正则表达式:获取最靠近另一个模式左侧的模式之间的文本
regex: get text between pattern nearest to the left of another pattern
我有一个字符串 txt
,其中包括模式 John
和几个国家。
我还有 vec_regex
,一堆匹配国家的正则表达式(但不是所有在文本中提到的)。
我想得到的是最靠近约翰左边的匹配国家和约翰之间的文本:法国文本约翰。
我认为这是需要的负面前瞻,但我无法让它工作。 (参见 here and here)。非常感谢!
library(stringr)
txt <- "Germany Russia and Germany Russia text Germany text France text John text text France and Spain"
vec_regex <- c("German\w*", "France|French", "Spain|Spanish", "Russia\w*")
vec_regex_or <- paste(vec_regex, collapse="|")
vec_regex_or
#> [1] "German\w*|France|French|Spain|Spanish|Russia\w*"
pattern_left <- paste0("(",vec_regex_or, ")",".*John")
pattern_left
#> [1] "(German\w*|France|French|Spain|Spanish|Russia\w*).*John"
str_extract(txt, regex(pattern_left))
#> [1] "Germany Russia and Germany Russia text Germany text France text John"
pattern_left <- paste0("(",vec_regex_or, ")","(?!(",vec_regex_or,"))",".*John") #neg. lookahead
pattern_left
#> [1] "(German\w*|France|French|Spain|Spanish|Russia\w*)(?!(German\w*|France|French|Spain|Spanish|Russia\w*)).*John"
str_extract(txt, regex(pattern_left))
#> [1] "Germany Russia and Germany Russia text Germany text France text John"
由 reprex package (v2.0.1)
于 2021-12-30 创建
你需要使用
pattern_left <- paste0("(",vec_regex_or, ")","(?:(?!",vec_regex_or,").)*","John")
pattern_left
# => [1] "(German\w*|France|French|Spain|Spanish|Russia\w*)(?:(?!German\w*|France|French|Spain|Spanish|Russia\w*).)*John"
str_extract(txt, regex(pattern_left))
# => [1] "France text John"
"(?:(?!",vec_regex_or,").)*"
部分正确创建了 。
此外,如果您打算将这些字符串作为整个单词进行匹配,请考虑添加单词边界:
pattern_left <- paste0("\b(",vec_regex_or, ")\b","(?:(?!",vec_regex_or,").)*","John\b")
我有一个字符串 txt
,其中包括模式 John
和几个国家。
我还有 vec_regex
,一堆匹配国家的正则表达式(但不是所有在文本中提到的)。
我想得到的是最靠近约翰左边的匹配国家和约翰之间的文本:法国文本约翰。
我认为这是需要的负面前瞻,但我无法让它工作。 (参见 here and here)。非常感谢!
library(stringr)
txt <- "Germany Russia and Germany Russia text Germany text France text John text text France and Spain"
vec_regex <- c("German\w*", "France|French", "Spain|Spanish", "Russia\w*")
vec_regex_or <- paste(vec_regex, collapse="|")
vec_regex_or
#> [1] "German\w*|France|French|Spain|Spanish|Russia\w*"
pattern_left <- paste0("(",vec_regex_or, ")",".*John")
pattern_left
#> [1] "(German\w*|France|French|Spain|Spanish|Russia\w*).*John"
str_extract(txt, regex(pattern_left))
#> [1] "Germany Russia and Germany Russia text Germany text France text John"
pattern_left <- paste0("(",vec_regex_or, ")","(?!(",vec_regex_or,"))",".*John") #neg. lookahead
pattern_left
#> [1] "(German\w*|France|French|Spain|Spanish|Russia\w*)(?!(German\w*|France|French|Spain|Spanish|Russia\w*)).*John"
str_extract(txt, regex(pattern_left))
#> [1] "Germany Russia and Germany Russia text Germany text France text John"
由 reprex package (v2.0.1)
于 2021-12-30 创建你需要使用
pattern_left <- paste0("(",vec_regex_or, ")","(?:(?!",vec_regex_or,").)*","John")
pattern_left
# => [1] "(German\w*|France|French|Spain|Spanish|Russia\w*)(?:(?!German\w*|France|French|Spain|Spanish|Russia\w*).)*John"
str_extract(txt, regex(pattern_left))
# => [1] "France text John"
"(?:(?!",vec_regex_or,").)*"
部分正确创建了
此外,如果您打算将这些字符串作为整个单词进行匹配,请考虑添加单词边界:
pattern_left <- paste0("\b(",vec_regex_or, ")\b","(?:(?!",vec_regex_or,").)*","John\b")