如何在多次出现模式时剪切子字符串?
How to cut a substring on several occurrences of a pattern?
在对 google 和 SO 进行彻底搜索后,我找不到关于大量正则表达式请求的具体问题。
我想解析一个字符串以替换一些子字符串。
但是,我的情况比简单的 str_replace
复杂一点,所以我需要字符串的结构化版本。
例如,我们取值 value="There is __obj1__ and also __obj2__ in the house."
,模式为 __.*?__
。
我想得到类似 c("There is ", "obj1", "and also", "obj2", "in the house")
的东西,这样我就可以对所有偶数指数采取行动。
这是我目前的位置。我正在与正则表达式的贪婪作斗争,它要么太多要么不够。矩阵return类型其实不是问题,我可以unlist(x[[1]][-1])
它。
library(tidyverse)
value="There is __obj1__ and also __obj2__ in the house."
str_match_all(value, "(.*?)__(.*?)__(.*?)") #too greedy at the very end
#> [[1]]
#> [,1] [,2] [,3] [,4]
#> [1,] "There is __obj1__" "There is " "obj1" ""
#> [2,] " and also __obj2__" " and also " "obj2" ""
str_match_all(value, "(.*)__(.*?)__(.*?)") #not greedy enough
#> [[1]]
#> [,1] [,2] [,3]
#> [1,] "There is __obj1__ and also __obj2__" "There is __obj1__ and also " "obj2"
#> [,4]
#> [1,] ""
str_match_all(value, "(.*?)__(.*)__(.*?)") #not greedy enough
#> [[1]]
#> [,1] [,2] [,3]
#> [1,] "There is __obj1__ and also __obj2__" "There is " "obj1__ and also __obj2"
#> [,4]
#> [1,] ""
str_match_all(value, "(.*?)__(.*?)__(.*)") #not greedy enough
#> [[1]]
#> [,1] [,2] [,3]
#> [1,] "There is __obj1__ and also __obj2__ in the house." "There is " "obj1"
#> [,4]
#> [1,] " and also __obj2__ in the house."
由 reprex package (v0.3.0)
于 2021 年 1 月 19 日创建
您可以使用
value <- "There is __obj1__ and also __obj2__ in the house."
library(stringr)
result <- stringr::str_match_all(value, "\s*(.*?)__(.*?)__(.*?)(?=\s*(?:__|$))")
result <- lapply(result, function(x) x[,-1])
result
输出:
[[1]]
[,1] [,2] [,3]
[1,] "There is " "obj1" " and also"
[2,] "" "obj2" " in the house."
模式是
\s*(.*?)__(.*?)__(.*?)(?=\s*(?:__|$))
见regex demo。请注意,您甚至可以使用具有 \s*
的所有格量词,即 \s*+
来加速匹配。
详情:
\s*
- 零个或多个空格
(.*?)
- 第 1 组:除换行字符外的任何零个或多个字符尽可能少
__
- 文字 __
子串
(.*?)
- 第 2 组:除换行字符外的任何零个或多个字符尽可能少
__
- 文字 __
子串
(.*?)
- 第 3 组:除换行字符外的任何零个或多个字符尽可能少
(?=\s*(?:__|$))
- 需要零个或多个空格后跟 __
或紧跟在当前位置右侧的字符串结尾的正前瞻。
在对 google 和 SO 进行彻底搜索后,我找不到关于大量正则表达式请求的具体问题。
我想解析一个字符串以替换一些子字符串。
但是,我的情况比简单的 str_replace
复杂一点,所以我需要字符串的结构化版本。
例如,我们取值 value="There is __obj1__ and also __obj2__ in the house."
,模式为 __.*?__
。
我想得到类似 c("There is ", "obj1", "and also", "obj2", "in the house")
的东西,这样我就可以对所有偶数指数采取行动。
这是我目前的位置。我正在与正则表达式的贪婪作斗争,它要么太多要么不够。矩阵return类型其实不是问题,我可以unlist(x[[1]][-1])
它。
library(tidyverse)
value="There is __obj1__ and also __obj2__ in the house."
str_match_all(value, "(.*?)__(.*?)__(.*?)") #too greedy at the very end
#> [[1]]
#> [,1] [,2] [,3] [,4]
#> [1,] "There is __obj1__" "There is " "obj1" ""
#> [2,] " and also __obj2__" " and also " "obj2" ""
str_match_all(value, "(.*)__(.*?)__(.*?)") #not greedy enough
#> [[1]]
#> [,1] [,2] [,3]
#> [1,] "There is __obj1__ and also __obj2__" "There is __obj1__ and also " "obj2"
#> [,4]
#> [1,] ""
str_match_all(value, "(.*?)__(.*)__(.*?)") #not greedy enough
#> [[1]]
#> [,1] [,2] [,3]
#> [1,] "There is __obj1__ and also __obj2__" "There is " "obj1__ and also __obj2"
#> [,4]
#> [1,] ""
str_match_all(value, "(.*?)__(.*?)__(.*)") #not greedy enough
#> [[1]]
#> [,1] [,2] [,3]
#> [1,] "There is __obj1__ and also __obj2__ in the house." "There is " "obj1"
#> [,4]
#> [1,] " and also __obj2__ in the house."
由 reprex package (v0.3.0)
于 2021 年 1 月 19 日创建您可以使用
value <- "There is __obj1__ and also __obj2__ in the house."
library(stringr)
result <- stringr::str_match_all(value, "\s*(.*?)__(.*?)__(.*?)(?=\s*(?:__|$))")
result <- lapply(result, function(x) x[,-1])
result
输出:
[[1]]
[,1] [,2] [,3]
[1,] "There is " "obj1" " and also"
[2,] "" "obj2" " in the house."
模式是
\s*(.*?)__(.*?)__(.*?)(?=\s*(?:__|$))
见regex demo。请注意,您甚至可以使用具有 \s*
的所有格量词,即 \s*+
来加速匹配。
详情:
\s*
- 零个或多个空格(.*?)
- 第 1 组:除换行字符外的任何零个或多个字符尽可能少__
- 文字__
子串(.*?)
- 第 2 组:除换行字符外的任何零个或多个字符尽可能少__
- 文字__
子串(.*?)
- 第 3 组:除换行字符外的任何零个或多个字符尽可能少(?=\s*(?:__|$))
- 需要零个或多个空格后跟__
或紧跟在当前位置右侧的字符串结尾的正前瞻。