专业动力组要求

Specialized Powerset Requirement

假设我编译了五个正则表达式模式,然后创建了五个布尔变量:

a =  re.search(first, mystr)
b =  re.search(second, mystr)
c =  re.search(third, mystr)
d = re.search(fourth, mystr)
e = re.search(fifth, mystr)

我想在一个函数中使用 (a, b, c, d, e) 的幂集,这样它会先找到更具体的匹配项,然后再失败。正如您所看到的,Powerset(好吧,它的列表表示)应该按元素的数量降序排序。

期望的行为:

 if a and b and c and d and e:
     return 'abcde' 
 if a and b and c and d:
     return 'abcd'
 [... and all the other 4-matches ]
 [now the three-matches]
 [now the two-matches]
 [now the single matches]
 return 'No Match'  # did not match anything

有没有办法以编程方式理想地简洁地利用 Powerset 来获得此函数的行为?

首先,这些不是布尔值;它们要么是匹配对象,要么是 None。其次,遍历幂集是一种非常低效的方法。如果相应的正则表达式匹配,只需将每个字母粘贴到字符串中:

return ''.join(letter for letter, match in zip('abcde', [a, b, c, d, e]) if match)

您可以像这样使用 itertools 文档中的 powerset() 生成器函数配方:

from itertools import chain, combinations
from pprint import pprint
import re

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

mystr   = "abcdefghijklmnopqrstuvwxyz"
first   = "a"
second  = "B"  # won't match, should be omitted from result
third   = "c"
fourth  = "d"
fifth   = "e"

a = 'a' if re.search(first, mystr) else ''
b = 'b' if re.search(second, mystr) else ''
c = 'c' if re.search(third, mystr) else ''
d = 'd' if re.search(fourth, mystr) else ''
e = 'e' if re.search(fifth, mystr) else ''

elements = (elem for elem in [a, b, c, d, e] if elem is not '')
spec_ps = [''.join(item for item in group)
              for group in sorted(powerset(elements), key=len, reverse=True)
                  if any(item for item in group)]

pprint(spec_ps)

输出:

['acde',
 'acd',
 'ace',
 'ade',
 'cde',
 'ac',
 'ad',
 'ae',
 'cd',
 'ce',
 'de',
 'a',
 'c',
 'd',
 'e']