使用正则表达式解析编号分隔的字符串

Question

我正在使用 PowerShell 脚本解析文本文件。部分内容的形式为：

(1) first thing (2) other thing (that,has,details) (3) third thing: stuff (some details), first thing
(1) first thing (2) other thing (that,has,details) (3) third thing: stuff (some details), first thing (4) potentially (5) more (6) things (7) too

就像一个分隔字符串，除了分隔符是一个递增的括号数字。我想将其解析为一个包含以下内容的字符串数组：

arr[0]="(1) first thing"
arr[1]="(2) other thing (that,has,details)"
arr[2]="(3) third thing: stuff (some details), first thing"

或

arr[0]="first thing"
arr[1]="other thing (that,has,details)"
arr[2]="third thing: stuff (some,details), first thing"

同时保持解决方案的灵活性以处理将来的其他字段。如果我可以将数字保留在一个单独的数组中，或者将数字和文本都放在一个二维数组中，那就更不可思议了。

arr[0,0]="(1)"
arr[0,1]="first thing"
arr[1,0]="(2)"
arr[1,1]="other thing (that,has,details)"
arr[2,0]="(3)"
arr[2,1]="third thing: stuff (some,details), first thing"

我正在尝试使用正则表达式来执行此操作，但遇到了一些麻烦。不愿意将某些东西拼凑在一起，因为使用正则表达式会非常好。

感谢您的帮助。

Answer 1

\G(\(\d+\))\s+((?:[^\(]|\((?!\d+\)))*[^\(\s])(?:\s+|$)

https://regex101.com/r/fbvpic/1

使用正则表达式解析编号分隔的字符串

parsing numbered delimited string using regular expression

regex

csv

powershell

parsing

text-parsing