用于拆分具有多个捕获组的单词列表的正则表达式

Question

我有以下字符串：

one two three four five six seven eight nine

我正在尝试构建一个正则表达式，将字符串分为三组：

第 1 组：'one two three'
第 2 组：'four five six'
第 3 组：'seven eight nine'

我正在使用 OR 语句，因为组可以是任意长度，例如two three four，将模式应用于此字符串应识别两组 -

第 1 组：'two'
第 2 组：'three four'。

Answer 1

我不太确定您想要的输出是什么。然而，这个表达式传递并创建了几个单独的捕获组以便于调用：

((one|two|three)\s.*?)((four|five|six)\s.*?)((seven|eight|nine)\s.*)

正则表达式

如果不需要此表达式，您可以 modify/change 您在 regex101.com 中的表达式。

正则表达式电路

您还可以在 jex.im:

中可视化您的表情

JavaScript演示

此代码段显示了各种捕获组可能 return：

const regex = /((one|two|three)\s.*?)((four|five|six)\s.*?)((seven|eight|nine)\s.*)/gm;
const str = `one two three four five six seven eight nine

two three four six seven eight`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

Answer 2

此答案假定您希望一次找到三个数字词组：

x <- c("one two three four five six seven eight nine")
regexp <- gregexpr("\S+(?:\s+\S+){2}", x)
regmatches(x, regexp)[[1]]

[1] "one two three"    "four five six"    "seven eight nine"

如果你想要一个更通用的解决方案，它不需要先验地知道输入的长度是多少（即存在多少个三组），那么你可能必须使用迭代方法：

parts <- strsplit(x, " ")[[1]]
output <- character(0)
for (i in seq(from=1, to=length(parts), by=3)) {
    output <- c(output, paste(parts[i], parts[i+1], parts[i+2]))
}
output

[1] "one two three"    "four five six"    "seven eight nine"

Answer 3

可能是一个大型正则表达式

(?=.*\b(?:one|two|three|four|five|six|seven|eight|nine)\b)(\b(?:one|two|three)(?:\s+(?:one|two|three))*\b)?.+?(\b(?:four|five|six)(?:\s+(?:four|five|six))*\b)?.+?(\b(?:seven|eight|nine)(?:\s+(?:seven|eight|nine))*\b)?

https://regex101.com/r/rUtkyU/1

可读版本

 (?=
      .* \b 
      (?:
           one
        |  two
        |  three
        |  four
        |  five
        |  six
        |  seven
        |  eight
        |  nine
      )
      \b 
 )
 (                             # (1 start)
      \b   
      (?: one | two | three )

      (?:
           \s+ 
           (?: one | two | three )
      )*
      \b 
 )?                            # (1 end)

 .+? 
 (                             # (2 start)
      \b        
      (?: four | five | six )

      (?:
           \s+ 
           (?: four | five | six )
      )*
      \b     
 )?                            # (2 end)

 .+?   
 (                             # (3 start)
      \b          
      (?: seven | eight | nine )

      (?:
           \s+ 
           (?: seven | eight | nine )
      )*
      \b   
 )?                            # (3 end)

用于拆分具有多个捕获组的单词列表的正则表达式

RegEx for splitting a list of words with multiple capturing groups

regex

r

regex-group

regex-greedy

regex-lookarounds

正则表达式

正则表达式电路

JavaScript演示