如何通过括号和非括号标记 python 拆分字符串?

How to split string by bracketed and non bracketed tokens python?

有一串单词用空格分隔,几个单词用方括号联合。例如

word1 word2 (word3 word4 word5) word6 (word7 word8) word9 word10 word11 (word12 word13 word14)

我想用空格分割它,但将括号中的单词计为单个单词,因此对于上面的示例,结果将是

[word1, word1, (word3 word4 word5), word6, (word7 word8), word9, word10, word11, (word12 word13 word14)]

带或不带括号的单词是否出现在结果列表中并不重要。重要的是将括号中的单词算作单个单词。我该怎么做?

我想你可以制作一个正则表达式来查找括号内的任何内容或 word-characters。这可能看起来像:

import re

s = 'word1 word2 (word3 word4 word5) word6 (word7 word8) word9 word10 word11 (word12 word13 word14)'

re.findall(r'(?:\(.*?\))|(?:\w+)', s)

哪个会给你:

['word1',
 'word2',
 '(word3 word4 word5)',
 'word6',
 '(word7 word8)',
 'word9',
 'word10',
 'word11',
 '(word12 word13 word14)']