用于拆分具有多个捕获组的单词列表的正则表达式
RegEx for splitting a list of words with multiple capturing groups
我有以下字符串:
one two three four five six seven eight nine
我正在尝试构建一个正则表达式,将字符串分为三组:
- 第 1 组:'one two three'
- 第 2 组:'four five six'
- 第 3 组:'seven eight nine'
我试过 (.*\b(one|two|three)?)(.*\b(four|five|six)?)(.*\b(seven|eight|nine)?)
的变体,但此模式将完整匹配分成一组,其中包含完整字符串 - the demo can be found here.
尝试 (.*\b(one|two|three))(.*\b(four|five|six))(.*\b(seven|eight|nine))
似乎让我更接近我想要的,但匹配信息面板显示该模式标识了两个匹配,每个匹配包含六个捕获组。
我正在使用 OR 语句,因为组可以是任意长度,例如two three four
,将模式应用于此字符串应识别两组 -
- 第 1 组:'two'
- 第 2 组:'three four'。
我不太确定您想要的输出是什么。然而,这个表达式传递并创建了几个单独的捕获组以便于调用:
((one|two|three)\s.*?)((four|five|six)\s.*?)((seven|eight|nine)\s.*)
正则表达式
如果不需要此表达式,您可以 modify/change 您在 regex101.com 中的表达式。
正则表达式电路
您还可以在 jex.im:
中可视化您的表情
JavaScript演示
此代码段显示了各种捕获组可能 return:
const regex = /((one|two|three)\s.*?)((four|five|six)\s.*?)((seven|eight|nine)\s.*)/gm;
const str = `one two three four five six seven eight nine
two three four six seven eight`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
此答案假定您希望一次找到三个数字词组:
x <- c("one two three four five six seven eight nine")
regexp <- gregexpr("\S+(?:\s+\S+){2}", x)
regmatches(x, regexp)[[1]]
[1] "one two three" "four five six" "seven eight nine"
如果你想要一个更通用的解决方案,它不需要先验地知道输入的长度是多少(即存在多少个三组),那么你可能必须使用迭代方法:
parts <- strsplit(x, " ")[[1]]
output <- character(0)
for (i in seq(from=1, to=length(parts), by=3)) {
output <- c(output, paste(parts[i], parts[i+1], parts[i+2]))
}
output
[1] "one two three" "four five six" "seven eight nine"
可能是一个大型正则表达式
(?=.*\b(?:one|two|three|four|five|six|seven|eight|nine)\b)(\b(?:one|two|three)(?:\s+(?:one|two|three))*\b)?.+?(\b(?:four|five|six)(?:\s+(?:four|five|six))*\b)?.+?(\b(?:seven|eight|nine)(?:\s+(?:seven|eight|nine))*\b)?
https://regex101.com/r/rUtkyU/1
可读版本
(?=
.* \b
(?:
one
| two
| three
| four
| five
| six
| seven
| eight
| nine
)
\b
)
( # (1 start)
\b
(?: one | two | three )
(?:
\s+
(?: one | two | three )
)*
\b
)? # (1 end)
.+?
( # (2 start)
\b
(?: four | five | six )
(?:
\s+
(?: four | five | six )
)*
\b
)? # (2 end)
.+?
( # (3 start)
\b
(?: seven | eight | nine )
(?:
\s+
(?: seven | eight | nine )
)*
\b
)? # (3 end)
我有以下字符串:
one two three four five six seven eight nine
我正在尝试构建一个正则表达式,将字符串分为三组:
- 第 1 组:'one two three'
- 第 2 组:'four five six'
- 第 3 组:'seven eight nine'
我试过 (.*\b(one|two|three)?)(.*\b(four|five|six)?)(.*\b(seven|eight|nine)?)
的变体,但此模式将完整匹配分成一组,其中包含完整字符串 - the demo can be found here.
尝试 (.*\b(one|two|three))(.*\b(four|five|six))(.*\b(seven|eight|nine))
似乎让我更接近我想要的,但匹配信息面板显示该模式标识了两个匹配,每个匹配包含六个捕获组。
我正在使用 OR 语句,因为组可以是任意长度,例如two three four
,将模式应用于此字符串应识别两组 -
- 第 1 组:'two'
- 第 2 组:'three four'。
我不太确定您想要的输出是什么。然而,这个表达式传递并创建了几个单独的捕获组以便于调用:
((one|two|three)\s.*?)((four|five|six)\s.*?)((seven|eight|nine)\s.*)
正则表达式
如果不需要此表达式,您可以 modify/change 您在 regex101.com 中的表达式。
正则表达式电路
您还可以在 jex.im:
中可视化您的表情JavaScript演示
此代码段显示了各种捕获组可能 return:
const regex = /((one|two|three)\s.*?)((four|five|six)\s.*?)((seven|eight|nine)\s.*)/gm;
const str = `one two three four five six seven eight nine
two three four six seven eight`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
此答案假定您希望一次找到三个数字词组:
x <- c("one two three four five six seven eight nine")
regexp <- gregexpr("\S+(?:\s+\S+){2}", x)
regmatches(x, regexp)[[1]]
[1] "one two three" "four five six" "seven eight nine"
如果你想要一个更通用的解决方案,它不需要先验地知道输入的长度是多少(即存在多少个三组),那么你可能必须使用迭代方法:
parts <- strsplit(x, " ")[[1]]
output <- character(0)
for (i in seq(from=1, to=length(parts), by=3)) {
output <- c(output, paste(parts[i], parts[i+1], parts[i+2]))
}
output
[1] "one two three" "four five six" "seven eight nine"
可能是一个大型正则表达式
(?=.*\b(?:one|two|three|four|five|six|seven|eight|nine)\b)(\b(?:one|two|three)(?:\s+(?:one|two|three))*\b)?.+?(\b(?:four|five|six)(?:\s+(?:four|five|six))*\b)?.+?(\b(?:seven|eight|nine)(?:\s+(?:seven|eight|nine))*\b)?
https://regex101.com/r/rUtkyU/1
可读版本
(?=
.* \b
(?:
one
| two
| three
| four
| five
| six
| seven
| eight
| nine
)
\b
)
( # (1 start)
\b
(?: one | two | three )
(?:
\s+
(?: one | two | three )
)*
\b
)? # (1 end)
.+?
( # (2 start)
\b
(?: four | five | six )
(?:
\s+
(?: four | five | six )
)*
\b
)? # (2 end)
.+?
( # (3 start)
\b
(?: seven | eight | nine )
(?:
\s+
(?: seven | eight | nine )
)*
\b
)? # (3 end)