Pythonic 的写法属于两组函数

Pythonic way of writing belongs function in two sets

我有两个只包含字符串的集合,我正在尝试编写如下函数:

def belongs(setA, setB):
   return True/False

定义:如果set,说setB有一个项目包含(string包含)[=16中的一个项目=],然后我调用 setB 属于 setA。一些例子:

setA = set(['apple', 'banana', 'strawberry'])

set1 = set(['abcc', 'xyz', 'klm'])                   # does not belong to setA
set2 = set(['app', 'banaba', 'baba'])                # does not belong to setA
set3 = set(['apples', 'xyz'])                        # belongs to setA
set4 = set(['bananaaa', 'hello', 'world', 'stack'])  # belongs to setA

我当前的代码:

def belongs(set1, set2):
    for i in set1:
        for j in set2:
            if i in j:
                return True
    return False

是否有 better/more Pythonic 方式来做同样的事情?

编写函数:

def belongs(set1, set2):
    return any(s1 in s2 for s1 in set1 for s2 in set2)

并测试它:

assert not belongs(setA, set1)
assert not belongs(setA, set2)
assert belongs(setA, set3)
assert belongs(setA, set4)

检查 setA 中的任何字符串是否是 setB 中任何项目的子字符串的问题,即 setB "belongs to" setA 是否可以使用 grep -F.

解决

grep -Flf setA set1 set2 set3 set4 打印 "belong to" setA 的集合,即本例中的 set3set4Aho–Corasick string matching algorithm formed the basis of the original Unix command fgrep. It can be much more efficient for a large input than a naive solution with nested loops e.g., from 20 hours for a brute-force approach down to a couple of minutes using fgrep.

如果无法安装第三方库;您可以尝试 re 模块,以在需要时提高性能:

import re
from itertools import imap

substrings = sorted(setA, key=len, reverse=True) # longest first
found = re.compile("|".join(map(re.escape, substrings))).search
print([any(imap(found, S)) for S in [set1, set2, set3, set4]])
# -> [False, False, True, True]