如何将嵌套大括号递归扩展为笛卡尔积
How to recursively expand nested braces as cartesian product
(我故意删除了某人添加的“Python”标签,以使该问题与语言无关。)
我想我设法通过为每个大括号展开堆栈来迭代解决这个问题,这是一个我很满意的解决方案(假设它有效)。但是我未能实现单个函数递归。有人可以帮忙吗?我尝试的结果太复杂了,我无法想象,根据逗号和大括号之间的关系,需要添加而不是扩展的不同场景。
有这样的优雅递归吗?对于我可以研究此类问题的其他类型的已建立 parsing/language 方法的任何建议,我也将不胜感激。
问题描述:给定一串字母、有效的大括号和逗号;展开大括号,以便结果包括紧接在大括号左侧的字母子串的所有可能性,这些字母子串与大括号内的每个逗号分隔的字母子串连接。大括号可以嵌套。
示例:
"abc{d,e}f{g,hi}" => ['abcdfg', 'abcdfhi', 'abcefg', 'abcefhi']
"abc{d{0,1},e}f{g{0{a,b},1,2},hi}" =>
['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']
(希望工作)迭代Python代码:
# Brace expansion
# Example:
# input "abc{d,e}f{g,hi}"
# output ["abcdfg", "abcdfhi", "abcefg", "abcefhi"]
def f(s):
stack = []
i = 0
while i < len(s):
j = i
while s[j] not in ['{', '}', ',']:
j += 1
if s[i:j]:
stack.append([s[i:j]])
i = j
if s[i] == '{':
stack.append('{')
i += 1
if s[i] == ',':
i += 1
# Unwind stack
if s[i] == '}':
# Get comma, separated elements
lst = []
while stack[-1] != '{':
lst.extend(reversed(stack.pop()))
lst = reversed(lst)
stack.pop() # Remove '{'
pfxs = reversed(stack.pop())
stack.append([pfx + sfx for pfx in pfxs for sfx in lst])
i += 1
# Cartesian product of stack
result = []
for i in range(1, len(stack)):
result = [pfx + sfx for pfx in stack[i-1] for sfx in stack[i]]
return result
输出:
s = "abc{d,e}f{g,hi}"
print(s)
print(f(s))
s = "abc{d{0,1},e}f{g{0{a,b},1,2},hi}"
print("")
print(s)
print(f(s))
"""
abc{d,e}f{g,hi}
['abcdfg', 'abcdfhi', 'abcefg', 'abcefhi']
abc{d{0,1},e}f{g{0{a,b},1,2},hi}
['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']
"""
深谋远虑
我读过“笛卡尔积”,所以我的第一反应当然是导入 the itertools module。
其次,由于大括号的嵌套性质,我将使用递归。
第三,有两点我们可以分割字符串:逗号和大括号。所以我要写两个相互递归的函数:一个扩展逗号,一个扩展大括号。
函数expand_commas
将尝试以逗号分隔字符串。函数 expand_braces
将假定该字符串在上层不包含逗号(也就是说,它可以在大括号内隐藏逗号,但在大括号外没有可见的逗号)。
第四,class str
有一个方法 .split
可以方便地拆分 ,
和 {
和 [= 上的字符串18=];但是 In 不能使用它,因为我希望拆分忽略嵌套字符。所以我将不得不编写自己的函数 get_commas
和 get_braces
来检测非嵌套的逗号和大括号。
Python代码
import itertools
# returns a pair of indices:
# the index of the first '{' in the string;
# the index of its matching '}'
def get_braces(s):
i = s.index('{')
j = i + 1
depth = 0
while j < len(s) and (depth > 0 or s[j] != '}'):
if s[j] == '{':
depth += 1
elif s[j] == '}':
depth -= 1
j += 1
return i,j
# returns a list of indices:
# the indices of every ',' in the string
# except ',' that are hidden inside braces
def get_commas(s):
depth = 0
commas = []
for i,c in enumerate(s):
if c == '{':
depth += 1
elif c == '}':
depth -= 1
elif depth == 0 and c == ',':
commas.append(i)
return commas
# splits on commas
# make recursive calls on substrings
# returns list of alternatives
def expand_commas(s):
commas = get_commas(s)
if commas:
return list(itertools.chain.from_iterable(expand_braces(s[i+1:j]) for i,j in zip([-1]+commas, commas+[len(s)])))
else:
return expand_braces(s)
# assumes no commas outside of braces
# splits on first '{' and its matching '}'
# make recursive calls, return cartesian product
def expand_braces(s):
if '{' not in s:
return [s]
else:
i,j = get_braces(s)
infixes = expand_commas(s[i+1:j])
suffixes = expand_braces(s[j+1:])
return list(''.join(p) for p in itertools.product([s[:i]], infixes, suffixes))
测试
# balanced string
print(expand_commas("abc{d{0,1},e}f{g{0{a,b},1,2},hi}"))
# ['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
# 'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
# 'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']
# only commas
print(expand_commas("a,b,c"))
# ['a', 'b', 'c']
# mismatched braces: missing closing brace
print(expand_commas("abc{d{0,1},e}f{g{0{a,b"))
# ['abcd0fg0a', 'abcd0fg0b', 'abcd1fg0a', 'abcd1fg0b', 'abcefg0a', 'abcefg0b']
print(expand_commas("abc{d{0,1},e}f{g{0{a,b}}}"))
# ['abcd0fg0a', 'abcd0fg0b', 'abcd1fg0a', 'abcd1fg0b', 'abcefg0a', 'abcefg0b']
# mismatched braces: extra closing brace
print(expand_commas("abc{d,e}}{f,g}}"))
# ['abcd}f', 'abce}f', 'g}}']
# empty braces
print(expand_commas("abc{}d"))
# ['abcd']
# empty commas
print(expand_commas("abc{,,,}d"))
# ['abcd', 'abcd', 'abcd', 'abcd']
为了区分递归是由逗号启动还是由大括号启动的情况,我建议将前缀作为参数传递,递归调用将处理的任何内容都应作为前缀。
我还会使用正则表达式标记输入。
这是它的样子:
import re
from itertools import product
def expand(s):
tokens = re.finditer(r"[^{},]+|.|$", s)
def recur(prefixes):
token = next(tokens).group(0) or "}"
if token == "}":
return prefixes
elif token == ",":
return prefixes + recur([""])
elif token == "{":
return recur([a + b for a, b in product(prefixes, recur([""]))])
else:
return recur([prefix + token for prefix in prefixes])
return recur([""])
调用示例:
s = "abc{d{0,1},e}f{g{0{a,b},1,2},hi}"
print(expand(s))
输出:
['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']
带输入验证的版本
如果解决方案在大括号不平衡时引发受控异常,则将函数扩展为:
def expand(s):
tokens = re.finditer(r"[^{},]+|.|$", s)
def recur(prefixes, until):
token = next(tokens).group(0)
if token == until:
return prefixes
elif token == ",":
return prefixes + recur([""], until)
elif token == "{":
return recur([a + b for a, b in product(prefixes, recur([""], "}"))], until)
elif token == "}":
raise ValueError("Unexpected closing brace")
elif token:
return recur([prefix + token for prefix in prefixes], until)
else:
raise ValueError("Missing closing brace")
return recur([""], "")
感谢 trincot 的 ,我能够想出我想要的单一递归函数。
Python代码:
# Returns expanded braces and the
# ending index for the processed section.
def f(s, prefixes=[''], i=0):
if i == len(s):
return prefixes, i
j = i
while j < len(s) and s[j] not in ['{', '}', ',']:
j += 1
new_prefixes = [pfx + s[i:j] for pfx in prefixes]
if j == len(s):
return new_prefixes, j
if s[j] == '}':
return new_prefixes, j + 1
if s[j] == '{':
suffixes, next_idx = f(s, [''], j + 1)
expanded = [pfx + sfx for pfx in new_prefixes for sfx in suffixes]
return f(s, expanded, next_idx)
if s[j] == ',':
words, next_idx = f(s, [''], j + 1)
return new_prefixes + words, next_idx
(我故意删除了某人添加的“Python”标签,以使该问题与语言无关。)
我想我设法通过为每个大括号展开堆栈来迭代解决这个问题,这是一个我很满意的解决方案(假设它有效)。但是我未能实现单个函数递归。有人可以帮忙吗?我尝试的结果太复杂了,我无法想象,根据逗号和大括号之间的关系,需要添加而不是扩展的不同场景。
有这样的优雅递归吗?对于我可以研究此类问题的其他类型的已建立 parsing/language 方法的任何建议,我也将不胜感激。
问题描述:给定一串字母、有效的大括号和逗号;展开大括号,以便结果包括紧接在大括号左侧的字母子串的所有可能性,这些字母子串与大括号内的每个逗号分隔的字母子串连接。大括号可以嵌套。
示例:
"abc{d,e}f{g,hi}" => ['abcdfg', 'abcdfhi', 'abcefg', 'abcefhi']
"abc{d{0,1},e}f{g{0{a,b},1,2},hi}" =>
['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']
(希望工作)迭代Python代码:
# Brace expansion
# Example:
# input "abc{d,e}f{g,hi}"
# output ["abcdfg", "abcdfhi", "abcefg", "abcefhi"]
def f(s):
stack = []
i = 0
while i < len(s):
j = i
while s[j] not in ['{', '}', ',']:
j += 1
if s[i:j]:
stack.append([s[i:j]])
i = j
if s[i] == '{':
stack.append('{')
i += 1
if s[i] == ',':
i += 1
# Unwind stack
if s[i] == '}':
# Get comma, separated elements
lst = []
while stack[-1] != '{':
lst.extend(reversed(stack.pop()))
lst = reversed(lst)
stack.pop() # Remove '{'
pfxs = reversed(stack.pop())
stack.append([pfx + sfx for pfx in pfxs for sfx in lst])
i += 1
# Cartesian product of stack
result = []
for i in range(1, len(stack)):
result = [pfx + sfx for pfx in stack[i-1] for sfx in stack[i]]
return result
输出:
s = "abc{d,e}f{g,hi}"
print(s)
print(f(s))
s = "abc{d{0,1},e}f{g{0{a,b},1,2},hi}"
print("")
print(s)
print(f(s))
"""
abc{d,e}f{g,hi}
['abcdfg', 'abcdfhi', 'abcefg', 'abcefhi']
abc{d{0,1},e}f{g{0{a,b},1,2},hi}
['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']
"""
深谋远虑
我读过“笛卡尔积”,所以我的第一反应当然是导入 the itertools module。
其次,由于大括号的嵌套性质,我将使用递归。
第三,有两点我们可以分割字符串:逗号和大括号。所以我要写两个相互递归的函数:一个扩展逗号,一个扩展大括号。
函数expand_commas
将尝试以逗号分隔字符串。函数 expand_braces
将假定该字符串在上层不包含逗号(也就是说,它可以在大括号内隐藏逗号,但在大括号外没有可见的逗号)。
第四,class str
有一个方法 .split
可以方便地拆分 ,
和 {
和 [= 上的字符串18=];但是 In 不能使用它,因为我希望拆分忽略嵌套字符。所以我将不得不编写自己的函数 get_commas
和 get_braces
来检测非嵌套的逗号和大括号。
Python代码
import itertools
# returns a pair of indices:
# the index of the first '{' in the string;
# the index of its matching '}'
def get_braces(s):
i = s.index('{')
j = i + 1
depth = 0
while j < len(s) and (depth > 0 or s[j] != '}'):
if s[j] == '{':
depth += 1
elif s[j] == '}':
depth -= 1
j += 1
return i,j
# returns a list of indices:
# the indices of every ',' in the string
# except ',' that are hidden inside braces
def get_commas(s):
depth = 0
commas = []
for i,c in enumerate(s):
if c == '{':
depth += 1
elif c == '}':
depth -= 1
elif depth == 0 and c == ',':
commas.append(i)
return commas
# splits on commas
# make recursive calls on substrings
# returns list of alternatives
def expand_commas(s):
commas = get_commas(s)
if commas:
return list(itertools.chain.from_iterable(expand_braces(s[i+1:j]) for i,j in zip([-1]+commas, commas+[len(s)])))
else:
return expand_braces(s)
# assumes no commas outside of braces
# splits on first '{' and its matching '}'
# make recursive calls, return cartesian product
def expand_braces(s):
if '{' not in s:
return [s]
else:
i,j = get_braces(s)
infixes = expand_commas(s[i+1:j])
suffixes = expand_braces(s[j+1:])
return list(''.join(p) for p in itertools.product([s[:i]], infixes, suffixes))
测试
# balanced string
print(expand_commas("abc{d{0,1},e}f{g{0{a,b},1,2},hi}"))
# ['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
# 'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
# 'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']
# only commas
print(expand_commas("a,b,c"))
# ['a', 'b', 'c']
# mismatched braces: missing closing brace
print(expand_commas("abc{d{0,1},e}f{g{0{a,b"))
# ['abcd0fg0a', 'abcd0fg0b', 'abcd1fg0a', 'abcd1fg0b', 'abcefg0a', 'abcefg0b']
print(expand_commas("abc{d{0,1},e}f{g{0{a,b}}}"))
# ['abcd0fg0a', 'abcd0fg0b', 'abcd1fg0a', 'abcd1fg0b', 'abcefg0a', 'abcefg0b']
# mismatched braces: extra closing brace
print(expand_commas("abc{d,e}}{f,g}}"))
# ['abcd}f', 'abce}f', 'g}}']
# empty braces
print(expand_commas("abc{}d"))
# ['abcd']
# empty commas
print(expand_commas("abc{,,,}d"))
# ['abcd', 'abcd', 'abcd', 'abcd']
为了区分递归是由逗号启动还是由大括号启动的情况,我建议将前缀作为参数传递,递归调用将处理的任何内容都应作为前缀。
我还会使用正则表达式标记输入。
这是它的样子:
import re
from itertools import product
def expand(s):
tokens = re.finditer(r"[^{},]+|.|$", s)
def recur(prefixes):
token = next(tokens).group(0) or "}"
if token == "}":
return prefixes
elif token == ",":
return prefixes + recur([""])
elif token == "{":
return recur([a + b for a, b in product(prefixes, recur([""]))])
else:
return recur([prefix + token for prefix in prefixes])
return recur([""])
调用示例:
s = "abc{d{0,1},e}f{g{0{a,b},1,2},hi}"
print(expand(s))
输出:
['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']
带输入验证的版本
如果解决方案在大括号不平衡时引发受控异常,则将函数扩展为:
def expand(s):
tokens = re.finditer(r"[^{},]+|.|$", s)
def recur(prefixes, until):
token = next(tokens).group(0)
if token == until:
return prefixes
elif token == ",":
return prefixes + recur([""], until)
elif token == "{":
return recur([a + b for a, b in product(prefixes, recur([""], "}"))], until)
elif token == "}":
raise ValueError("Unexpected closing brace")
elif token:
return recur([prefix + token for prefix in prefixes], until)
else:
raise ValueError("Missing closing brace")
return recur([""], "")
感谢 trincot 的
Python代码:
# Returns expanded braces and the
# ending index for the processed section.
def f(s, prefixes=[''], i=0):
if i == len(s):
return prefixes, i
j = i
while j < len(s) and s[j] not in ['{', '}', ',']:
j += 1
new_prefixes = [pfx + s[i:j] for pfx in prefixes]
if j == len(s):
return new_prefixes, j
if s[j] == '}':
return new_prefixes, j + 1
if s[j] == '{':
suffixes, next_idx = f(s, [''], j + 1)
expanded = [pfx + sfx for pfx in new_prefixes for sfx in suffixes]
return f(s, expanded, next_idx)
if s[j] == ',':
words, next_idx = f(s, [''], j + 1)
return new_prefixes + words, next_idx