查找集列表之间的交集
Find intersection sets between list of sets
下面的问题是python3.6。假设我有集合列表,例如
L1 = [{2,7},{2,7,8},{2,3,6,7},{1,2,4,5,7}]
L2 = [{3,6},{1,3,4,6,7},{2,3,5,6,8}]
L3 = [{2,5,7,8},{1,2,3,5,7,8}, {2,4,5,6,7,8}]
我需要找到 L1、L2 和 L3 的每个元素之间的所有交集。例如:
{2,7}.intersection({3,6}).intersection({2,5,7,8})= empty
{2,7}.intersection({3,6}).intersection({1,2,3,5,7,8})= empty
{2,7}.intersection({3,6}).intersection({2,4,5,6,7,8})= empty
{2,7}.intersection({1,3,4,6,7}).intersection({2,5,7,8})= {7}
{2,7}.intersection({1,3,4,6,7}).intersection({1,2,3,5,7,8})= {7}
{2,7}.intersection({1,3,4,6,7}).intersection({2,4,5,6,7,8})= {7}
.....................
如果我们继续这样做,我们最终会得到以下集合:
{{empty},{2},{3},{6},{7},{2,3},{2,5},{2,6},{2,8}, {3,7},{4,7},{6,7}}
假设:
- 我有很多列表 L1、L2、L3、...Ln。而且我不知道我有多少个列表。
- 每个列表 L1、L2、L3..Ln 都很大,所以我无法将它们全部加载到内存中。
我的问题是:有什么方法可以顺序计算那个集合,例如,在L1和L2之间计算,然后用结果计算L3,等等。 .
你可以先计算出L1和L2之间所有可能的交集,然后计算那个集合和L3之间的交集等等。
list_generator = iter([ # some generator that produces your lists
[{2,7}, {2,7,8}, {2,3,6,7}, {1,2,4,5,7}],
[{3,6}, {1,3,4,6,7}, {2,3,5,6,8}],
[{2,5,7,8}, {1,2,3,5,7,8}, {2,4,5,6,7,8}],
])
# for example, you can read from a file:
# (adapt the format to your needs)
def list_generator_from_file(filename):
with open(filename) as f:
for line in f:
yield list(map(lambda x: set(x.split(',')), line.strip().split('|')))
# list_generator would be then list_generator_from_file('myfile.dat')
intersections = next(list_generator) # get first list
new_intersections = set()
for list_ in list_generator:
for old in intersections:
for new in list_:
new_intersections.add(frozenset(old.intersection(new)))
# at this point we don't need the current list any more
intersections, new_intersections = new_intersections, set()
print(intersections)
输出看起来像 {frozenset({7}), frozenset({3, 7}), frozenset({3}), frozenset({6}), frozenset({2, 6}), frozenset({6, 7}), frozenset(), frozenset({8, 2}), frozenset({2, 3}), frozenset({1, 7}), frozenset({4, 7}), frozenset({2, 5}), frozenset({2})}
,除了您错过的 {1,7}
集外,它与您现有的相匹配。
这可能是您要找的:
res = {frozenset(frozenset(x) for x in (i, j, k)): i & j & k \
for i in L1 for j in L2 for k in L3}
说明
frozenset
是必需的,因为 set
不可散列。字典键必须是可哈希的。
- 循环遍历 L1、L2、L3 中每个长度为 3 的项目组合。
- 通过
&
运算计算交集,相当于set.intersection
.
您可以使用 functools.reduce(set.intersection, sets)
to handle variable inputs. And itertools.product(nested_list_of_sets)
从多个序列中的每个序列生成一个元素的组合。
通过使用生成器函数 (yield
) 和 itertools.product 等惰性迭代器,您可以将内存使用量减少几个数量级。
import itertools
import functools
nested_list_of_sets = [
[{2,7}, {2,7,8}, {2,3,6,7}, {1,2,4,5,7}],
[{3,6}, {1,3,4,6,7}, {2,3,5,6,8}],
[{2,5,7,8}, {1,2,3,5,7,8}, {2,4,5,6,7,8}],
]
def find_intersections(sets):
"""Take a nested sequence of sets and generate intersections"""
for combo in itertools.product(*sets):
yield (combo, functools.reduce(set.intersection, combo))
for input_sets, output_set in find_intersections(nested_list_of_sets):
print('{:<55} -> {}'.format(repr(input_sets), output_set))
输出为
({2, 7}, {3, 6}, {8, 2, 5, 7}) -> set()
({2, 7}, {3, 6}, {1, 2, 3, 5, 7, 8}) -> set()
({2, 7}, {3, 6}, {2, 4, 5, 6, 7, 8}) -> set()
({2, 7}, {1, 3, 4, 6, 7}, {8, 2, 5, 7}) -> {7}
({2, 7}, {1, 3, 4, 6, 7}, {1, 2, 3, 5, 7, 8}) -> {7}
({2, 7}, {1, 3, 4, 6, 7}, {2, 4, 5, 6, 7, 8}) -> {7}
({2, 7}, {2, 3, 5, 6, 8}, {8, 2, 5, 7}) -> {2}
({2, 7}, {2, 3, 5, 6, 8}, {1, 2, 3, 5, 7, 8}) -> {2}
# ... etc
下面的问题是python3.6。假设我有集合列表,例如
L1 = [{2,7},{2,7,8},{2,3,6,7},{1,2,4,5,7}]
L2 = [{3,6},{1,3,4,6,7},{2,3,5,6,8}]
L3 = [{2,5,7,8},{1,2,3,5,7,8}, {2,4,5,6,7,8}]
我需要找到 L1、L2 和 L3 的每个元素之间的所有交集。例如:
{2,7}.intersection({3,6}).intersection({2,5,7,8})= empty
{2,7}.intersection({3,6}).intersection({1,2,3,5,7,8})= empty
{2,7}.intersection({3,6}).intersection({2,4,5,6,7,8})= empty
{2,7}.intersection({1,3,4,6,7}).intersection({2,5,7,8})= {7}
{2,7}.intersection({1,3,4,6,7}).intersection({1,2,3,5,7,8})= {7}
{2,7}.intersection({1,3,4,6,7}).intersection({2,4,5,6,7,8})= {7}
.....................
如果我们继续这样做,我们最终会得到以下集合:
{{empty},{2},{3},{6},{7},{2,3},{2,5},{2,6},{2,8}, {3,7},{4,7},{6,7}}
假设:
- 我有很多列表 L1、L2、L3、...Ln。而且我不知道我有多少个列表。
- 每个列表 L1、L2、L3..Ln 都很大,所以我无法将它们全部加载到内存中。
我的问题是:有什么方法可以顺序计算那个集合,例如,在L1和L2之间计算,然后用结果计算L3,等等。 .
你可以先计算出L1和L2之间所有可能的交集,然后计算那个集合和L3之间的交集等等。
list_generator = iter([ # some generator that produces your lists
[{2,7}, {2,7,8}, {2,3,6,7}, {1,2,4,5,7}],
[{3,6}, {1,3,4,6,7}, {2,3,5,6,8}],
[{2,5,7,8}, {1,2,3,5,7,8}, {2,4,5,6,7,8}],
])
# for example, you can read from a file:
# (adapt the format to your needs)
def list_generator_from_file(filename):
with open(filename) as f:
for line in f:
yield list(map(lambda x: set(x.split(',')), line.strip().split('|')))
# list_generator would be then list_generator_from_file('myfile.dat')
intersections = next(list_generator) # get first list
new_intersections = set()
for list_ in list_generator:
for old in intersections:
for new in list_:
new_intersections.add(frozenset(old.intersection(new)))
# at this point we don't need the current list any more
intersections, new_intersections = new_intersections, set()
print(intersections)
输出看起来像 {frozenset({7}), frozenset({3, 7}), frozenset({3}), frozenset({6}), frozenset({2, 6}), frozenset({6, 7}), frozenset(), frozenset({8, 2}), frozenset({2, 3}), frozenset({1, 7}), frozenset({4, 7}), frozenset({2, 5}), frozenset({2})}
,除了您错过的 {1,7}
集外,它与您现有的相匹配。
这可能是您要找的:
res = {frozenset(frozenset(x) for x in (i, j, k)): i & j & k \
for i in L1 for j in L2 for k in L3}
说明
frozenset
是必需的,因为set
不可散列。字典键必须是可哈希的。- 循环遍历 L1、L2、L3 中每个长度为 3 的项目组合。
- 通过
&
运算计算交集,相当于set.intersection
.
您可以使用 functools.reduce(set.intersection, sets)
to handle variable inputs. And itertools.product(nested_list_of_sets)
从多个序列中的每个序列生成一个元素的组合。
通过使用生成器函数 (yield
) 和 itertools.product 等惰性迭代器,您可以将内存使用量减少几个数量级。
import itertools
import functools
nested_list_of_sets = [
[{2,7}, {2,7,8}, {2,3,6,7}, {1,2,4,5,7}],
[{3,6}, {1,3,4,6,7}, {2,3,5,6,8}],
[{2,5,7,8}, {1,2,3,5,7,8}, {2,4,5,6,7,8}],
]
def find_intersections(sets):
"""Take a nested sequence of sets and generate intersections"""
for combo in itertools.product(*sets):
yield (combo, functools.reduce(set.intersection, combo))
for input_sets, output_set in find_intersections(nested_list_of_sets):
print('{:<55} -> {}'.format(repr(input_sets), output_set))
输出为
({2, 7}, {3, 6}, {8, 2, 5, 7}) -> set()
({2, 7}, {3, 6}, {1, 2, 3, 5, 7, 8}) -> set()
({2, 7}, {3, 6}, {2, 4, 5, 6, 7, 8}) -> set()
({2, 7}, {1, 3, 4, 6, 7}, {8, 2, 5, 7}) -> {7}
({2, 7}, {1, 3, 4, 6, 7}, {1, 2, 3, 5, 7, 8}) -> {7}
({2, 7}, {1, 3, 4, 6, 7}, {2, 4, 5, 6, 7, 8}) -> {7}
({2, 7}, {2, 3, 5, 6, 8}, {8, 2, 5, 7}) -> {2}
({2, 7}, {2, 3, 5, 6, 8}, {1, 2, 3, 5, 7, 8}) -> {2}
# ... etc