如何加速 python 中循环下的集合交集和并集运算
How to accelerate the operation including intersection and union of sets under the loops in python
judge = [[0,3,5], [1,2,4], [1,5,6], [],..., []]
a = [[1,2], [2,3,4,5,7,9], [1,4,5], [],..., []]
# len(judge) == len(a)
res_intersect = []
for i in range(len(a)):
for j in range(i+1,len(a)):
if len(set(judge[i])&set(judge[j])) != 0:
res_intersect.append(set(a[i])&set(a[j]))
a 和 judge 的长度相同,都远大于 10000。我需要用不同的 a 执行此操作并进行数百次判断,而我发现 numba 不支持 set(),如何加速?
提前致谢!
- 预先将输入的内容
list
转换为 set
并节省大量时间
- 使用
isdisjoint
测试重叠,而不会不必要地临时 set
- 使用
itertools.combinations
来简化嵌套循环
所有更改:
judge = [[0,3,5], [1,2,4], [1,5,6], [],..., []]
a = [[1,2], [2,3,4,5,7,9], [1,4,5], [],..., []]
# len(judge) == len(a)
res_intersect = []
for (j1, a1), (j2, a2) in itertools.combinations(zip(map(set, judge), map(set, a)), 2)):
if not j1.isdisjoint(j2):
res_intersect.append(a1 & a2)
可能不会从 numba
中受益,但它应该通过避免绝对大量的临时 set
s 来显着减少开销。
judge = [[0,3,5], [1,2,4], [1,5,6], [],..., []]
a = [[1,2], [2,3,4,5,7,9], [1,4,5], [],..., []]
# len(judge) == len(a)
res_intersect = []
for i in range(len(a)):
for j in range(i+1,len(a)):
if len(set(judge[i])&set(judge[j])) != 0:
res_intersect.append(set(a[i])&set(a[j]))
a 和 judge 的长度相同,都远大于 10000。我需要用不同的 a 执行此操作并进行数百次判断,而我发现 numba 不支持 set(),如何加速? 提前致谢!
- 预先将输入的内容
list
转换为set
并节省大量时间 - 使用
isdisjoint
测试重叠,而不会不必要地临时set
- 使用
itertools.combinations
来简化嵌套循环
所有更改:
judge = [[0,3,5], [1,2,4], [1,5,6], [],..., []]
a = [[1,2], [2,3,4,5,7,9], [1,4,5], [],..., []]
# len(judge) == len(a)
res_intersect = []
for (j1, a1), (j2, a2) in itertools.combinations(zip(map(set, judge), map(set, a)), 2)):
if not j1.isdisjoint(j2):
res_intersect.append(a1 & a2)
可能不会从 numba
中受益,但它应该通过避免绝对大量的临时 set
s 来显着减少开销。