提取嵌套括号内的字符串

Extract string inside nested brackets

我需要像这样从嵌套的括号中提取字符串:

[ this is [ hello [ who ] [what ] from the other side ] slim shady ]

结果(顺序无关紧要)

This is slim shady
Hello from the other side
Who 
What

请注意,字符串可以有 N 个方括号,它们将始终有效,但可以嵌套也可以不嵌套。此外,字符串不必以括号开头。

我在网上找到的类似问题的解决方案建议使用正则表达式,但我不确定它是否适用于这种情况。

我正在考虑实现这个类似于我们检查字符串是否包含所有有效括号的方式:

遍历字符串。如果我们看到一个 [ 我们将其索引压入堆栈,如果我们看到一个 ],我们从那里子串到当前位置。

但是,我们需要从原始字符串中删除该子字符串,这样我们就不会将其作为任何输出的一部分。因此,我不只是将索引推入堆栈,而是在考虑创建一个 LinkedList,当我们找到一个 [ 时,我们将该节点插入 LinkedList。这将使我们能够轻松地从 LinkedList 中删除子字符串。

这是一个好方法还是有更清晰、已知的解决方案?

编辑:

'[ this is [ hello [ who ] [what ] from the other [side] ] slim shady ][oh my [g[a[w[d]]]]]'

应该return(顺序无关紧要)

this is slim shady
hello from the other
who 
what 
side
oh my
g
a
w
d

空格无关紧要,之后删除是微不足道的。重要的是能够区分括号内的不同内容。通过在新行中分隔它们,或者有一个字符串列表。

这可以使用正则表达式轻松解决:

import re

s= '[ this is [ hello [ who ] [what ] from the other [side] ] slim shady ][oh my [g[a[w[d]]]]]'

result= []
pattern= r'\[([^[\]]*)\]' #regex pattern to find non-nested square brackets
while '[' in s: #while brackets remain
    result.extend(re.findall(pattern, s)) #find them all and add them to the list
    s= re.sub(pattern, '', s) #then remove them
result= filter(None, (t.strip() for t in result)) #strip whitespace and drop empty strings

#result: ['who', 'what', 'side', 'd', 'hello   from the other', 'w', 'this is  slim shady', 'a', 'g', 'oh my']

您可以使用树状结构表示您的比赛。

class BracketMatch:
    def __init__(self, refstr, parent=None, start=-1, end=-1):
        self.parent = parent
        self.start = start
        self.end = end
        self.refstr = refstr
        self.nested_matches = []
    def __str__(self):
        cur_index = self.start+1
        result = ""
        if self.start == -1 or self.end == -1:
            return ""
        for child_match in self.nested_matches:
            if child_match.start != -1 and child_match.end != -1:
                result += self.refstr[cur_index:child_match.start]
                cur_index = child_match.end + 1
            else:
                continue
        result += self.refstr[cur_index:self.end]
        return result

# Main script
haystack = '''[ this is [ hello [ who ] [what ] from the other side ] slim shady ]'''
root = BracketMatch(haystack)
cur_match = root
for i in range(len(haystack)):
    if '[' == haystack[i]:
        new_match = BracketMatch(haystack, cur_match, i)
        cur_match.nested_matches.append(new_match)
        cur_match = new_match
    elif ']' == haystack[i]:
        cur_match.end = i
        cur_match = cur_match.parent
    else:
        continue
# Here we built the set of matches, now we must print them
nodes_list = root.nested_matches
# So we conduct a BFS to visit and print each match...
while nodes_list != []:
    node = nodes_list.pop(0)
    nodes_list.extend(node.nested_matches)
    print("Match: " + str(node).strip())

该程序的输出将是:

Match: this is slim shady
Match: hello from the other side
Match: who
Match: what

a = '[ this is [ hello [ who ] [what ] from the other side ] slim shady ]'
lvl = -1
words = []
for i in a:
    if i == '[' :
        lvl += 1
        words.append('')
    elif i == ']' :
        lvl -= 1
    else:
        words[lvl] += i

for word in words:
    print ' '.join(word.split())

这给出 o/p -

这是苗条的黑幕

对方问好

谁什么

此代码按字符扫描文本,并在每次打开 [ 时将空 list 压入堆栈,并在每次关闭时从堆栈弹出最后一个压入的 list ]

text = '[ this is [ hello [ who ] [what ] from the other side ] slim shady ]'

def parse(text):
    stack = []
    for char in text:
        if char == '[':
            #stack push
            stack.append([])
        elif char == ']':
            yield ''.join(stack.pop())
        else:
            #stack peek
            stack[-1].append(char)

print(tuple(parse(text)))

输出;

(' who ', 'what ', ' hello   from the other side ', ' this is  slim shady ')
(' who ', 'what ', 'side', ' hello   from the other  ', ' this is  slim shady ', 'd', 'w', 'a', 'g', 'oh my ')