如何将嵌套大括号递归扩展为笛卡尔积

How to recursively expand nested braces as cartesian product

(我故意删除了某人添加的“Python”标签,以使该问题与语言无关。)

我想我设法通过为每个大括号展开堆栈来迭代解决这个问题,这是一个我很满意的解决方案(假设它有效)。但是我未能实现单个函数递归。有人可以帮忙吗?我尝试的结果太复杂了,我无法想象,根据逗号和大括号之间的关系,需要添加而不是扩展的不同场景。

有这样的优雅递归吗?对于我可以研究此类问题的其他类型的已建立 parsing/language 方法的任何建议,我也将不胜感激。

问题描述:给定一串字母、有效的大括号和逗号;展开大括号,以便结果包括紧接在大括号左侧的字母子串的所有可能性,这些字母子串与大括号内的每个逗号分隔的字母子串连接。大括号可以嵌套。

示例:

"abc{d,e}f{g,hi}" => ['abcdfg', 'abcdfhi', 'abcefg', 'abcefhi']


"abc{d{0,1},e}f{g{0{a,b},1,2},hi}" =>

['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
 'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
 'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']

(希望工作)迭代Python代码:

# Brace expansion
# Example:
# input "abc{d,e}f{g,hi}"
# output ["abcdfg", "abcdfhi", "abcefg", "abcefhi"]
def f(s):
  stack = []

  i = 0

  while i < len(s):
    j = i

    while s[j] not in ['{', '}', ',']:
      j += 1

    if s[i:j]:
      stack.append([s[i:j]])

    i = j

    if s[i] == '{':
      stack.append('{')
      i += 1

    if s[i] == ',':
      i += 1

    # Unwind stack
    if s[i] == '}':
      # Get comma, separated elements
      lst = []
      while stack[-1] != '{':
        lst.extend(reversed(stack.pop()))
      lst = reversed(lst)
      stack.pop() # Remove '{'
      pfxs = reversed(stack.pop())
      stack.append([pfx + sfx for pfx in pfxs for sfx in lst])
      i += 1

  # Cartesian product of stack
  result = []

  for i in range(1, len(stack)):
    result = [pfx + sfx for pfx in stack[i-1] for sfx in stack[i]]
    
  return result

输出:

s = "abc{d,e}f{g,hi}"

print(s)
print(f(s))

s = "abc{d{0,1},e}f{g{0{a,b},1,2},hi}"

print("")
print(s)
print(f(s))

"""
abc{d,e}f{g,hi}
['abcdfg', 'abcdfhi', 'abcefg', 'abcefhi']

abc{d{0,1},e}f{g{0{a,b},1,2},hi}
['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
 'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
 'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']
"""

深谋远虑

我读过“笛卡尔积”,所以我的第一反应当然是导入 the itertools module

其次,由于大括号的嵌套性质,我将使用递归。

第三,有两点我们可以分割字符串:逗号和大括号。所以我要写两个相互递归的函数:一个扩展逗号,一个扩展大括号。

函数expand_commas 将尝试以逗号分隔字符串。函数 expand_braces 将假定该字符串在上层不包含逗号(也就是说,它可以在大括号内隐藏逗号,但在大括号外没有可见的逗号)。

第四,class str 有一个方法 .split 可以方便地拆分 ,{ 和 [= 上的字符串18=];但是 In 不能使用它,因为我希望拆分忽略嵌套字符。所以我将不得不编写自己的函数 get_commasget_braces 来检测非嵌套的逗号和大括号。

Python代码

import itertools

# returns a pair of indices:
#   the index of the first '{' in the string;
#   the index of its matching '}'
def get_braces(s):
  i = s.index('{')
  j = i + 1
  depth = 0
  while j < len(s) and (depth > 0 or s[j] != '}'):
    if s[j] == '{':
      depth += 1
    elif s[j] == '}':
      depth -= 1
    j += 1
  return i,j

# returns a list of indices:
#   the indices of every ',' in the string
#   except ',' that are hidden inside braces
def get_commas(s):
  depth = 0
  commas = []
  for i,c in enumerate(s):
    if c == '{':
      depth += 1
    elif c == '}':
      depth -= 1
    elif depth == 0 and c == ',':
      commas.append(i)
  return commas

# splits on commas
# make recursive calls on substrings
# returns list of alternatives
def expand_commas(s):
  commas = get_commas(s)
  if commas:
    return list(itertools.chain.from_iterable(expand_braces(s[i+1:j]) for i,j in zip([-1]+commas, commas+[len(s)])))
  else:
    return expand_braces(s)

# assumes no commas outside of braces
# splits on first '{' and its matching '}'
# make recursive calls, return cartesian product
def expand_braces(s):
  if '{' not in s:
    return [s]
  else:
    i,j = get_braces(s)
    infixes = expand_commas(s[i+1:j])
    suffixes = expand_braces(s[j+1:])
    return list(''.join(p) for p in itertools.product([s[:i]], infixes, suffixes))

测试

# balanced string
print(expand_commas("abc{d{0,1},e}f{g{0{a,b},1,2},hi}"))
# ['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
#  'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
#  'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']

# only commas
print(expand_commas("a,b,c"))
# ['a', 'b', 'c']

# mismatched braces: missing closing brace
print(expand_commas("abc{d{0,1},e}f{g{0{a,b"))
# ['abcd0fg0a', 'abcd0fg0b', 'abcd1fg0a', 'abcd1fg0b', 'abcefg0a', 'abcefg0b']

print(expand_commas("abc{d{0,1},e}f{g{0{a,b}}}"))
# ['abcd0fg0a', 'abcd0fg0b', 'abcd1fg0a', 'abcd1fg0b', 'abcefg0a', 'abcefg0b']

# mismatched braces: extra closing brace
print(expand_commas("abc{d,e}}{f,g}}"))
# ['abcd}f', 'abce}f', 'g}}']

# empty braces
print(expand_commas("abc{}d"))
# ['abcd']

# empty commas
print(expand_commas("abc{,,,}d"))
# ['abcd', 'abcd', 'abcd', 'abcd']

为了区分递归是由逗号启动还是由大括号启动的情况,我建议将前缀作为参数传递,递归调用将处理的任何内容都应作为前缀。

我还会使用正则表达式标记输入。

这是它的样子:

import re
from itertools import product 

def expand(s):
    tokens = re.finditer(r"[^{},]+|.|$", s)
    
    def recur(prefixes):
        token = next(tokens).group(0) or "}"
        if token == "}":
            return prefixes
        elif token == ",":
            return prefixes + recur([""])
        elif token == "{":
            return recur([a + b for a, b in product(prefixes, recur([""]))])
        else:
            return recur([prefix + token for prefix in prefixes])

    return recur([""])

调用示例:

s = "abc{d{0,1},e}f{g{0{a,b},1,2},hi}"

print(expand(s))

输出:

['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi', 
 'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi', 
 'abcefg0a',  'abcefg0b',  'abcefg1',  'abcefg2',  'abcefhi']

带输入验证的版本

如果解决方案在大括号不平衡时引发受控异常,则将函数扩展为:

def expand(s):
    tokens = re.finditer(r"[^{},]+|.|$", s)
    
    def recur(prefixes, until):
        token = next(tokens).group(0)
        if token == until:
            return prefixes
        elif token == ",":
            return prefixes + recur([""], until)
        elif token == "{":
            return recur([a + b for a, b in product(prefixes, recur([""], "}"))], until)
        elif token == "}":
            raise ValueError("Unexpected closing brace")
        elif token:
            return recur([prefix + token for prefix in prefixes], until)
        else:
            raise ValueError("Missing closing brace")
    
    return recur([""], "")

感谢 trincot 的 ,我能够想出我想要的单一递归函数。

Python代码:

# Returns expanded braces and the
# ending index for the processed section.
def f(s, prefixes=[''], i=0):
  if i == len(s):
    return prefixes, i

  j = i

  while j < len(s) and s[j] not in ['{', '}', ',']:
    j += 1

  new_prefixes = [pfx + s[i:j] for pfx in prefixes]

  if j == len(s):
    return new_prefixes, j
  
  if s[j] == '}':
    return new_prefixes, j + 1

  if s[j] == '{':
    suffixes, next_idx = f(s, [''], j + 1)
    expanded = [pfx + sfx for pfx in new_prefixes for sfx in suffixes]
    return f(s, expanded, next_idx)

  if s[j] == ',':
    words, next_idx = f(s, [''], j + 1)
    return new_prefixes + words, next_idx