查找同一集合的两个分区之间所有不同交集的 Pythonic 和有效方法
Pythonic and efficient way to find all the different intersections between two partitions of the same set
我需要找到同一组的两个分区之间的所有不同交集。例如,如果我们有以下两个相同集合的分区
x = [[1, 2], [3, 4, 5], [6, 7, 8, 9, 10]]
y = [[1, 3, 6, 7], [2, 4, 5, 8, 9, 10]]
要求的结果是
[[1], [2], [3], [4, 5], [6, 7], [8, 9, 10]].
详细地说,我们计算 x 和 y 的每个子集之间的笛卡尔积,并且对于这些乘积中的每一个,如果它们属于其相关子集的交集,我们将相应地对新子集中的元素进行分类。
什么是最佳的/更 pythonic 的方式来做到这一点?提前致谢!
当前答案的性能比较:
import numpy as np
def partitioning(alist, indices):
return [alist[i:j] for i, j in zip([0]+indices, indices+[None])]
total = 1000
sample1 = np.sort(np.random.choice(total, int(total/10), replace=False))
sample2 = np.sort(np.random.choice(total, int(total/2), replace=False))
a = partitioning(np.arange(total), list(sample1))
b = partitioning(np.arange(total), list(sample2))
def partition_decomposition_product_1(x, y):
out = []
for sublist1 in x:
d = {}
for val in sublist1:
for i, sublist2 in enumerate(y):
if val in sublist2:
d.setdefault(i, []).append(val)
out.extend(d.values())
return out
def partition_decomposition_product_2(x, y):
all_s = []
for sx in x:
for sy in y:
ss = list(filter(lambda x:x in sx, sy))
if ss:
all_s.append(ss)
return all_s
def partition_decomposition_product_3(x, y):
return [np.intersect1d(i,j) for i in x for j in y]
并使用 %timeit
测量执行时间
%timeit partition_decomposition_product_1(a, b)
%timeit partition_decomposition_product_2(a, b)
%timeit partition_decomposition_product_3(a, b)
我们发现
2.16 s ± 246 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
620 ms ± 84.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.03 s ± 111 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
因此第二种解决方案是最快的。
我不确定我是否理解正确,但这个脚本产生了你在问题中得到的结果:
x = [[1, 2], [3, 4, 5], [6, 7, 8, 9, 10]]
y = [[1, 3, 6, 7], [2, 4, 5, 8, 9, 10]]
out = []
for sublist1 in x:
d = {}
for val in sublist1:
for i, sublist2 in enumerate(y):
if val in sublist2:
d.setdefault(i, []).append(val)
out.extend(d.values())
print(out)
打印:
[[1], [2], [3], [4, 5], [6, 7], [8, 9, 10]]
这两个列表是同一个集合的分区与算法选择无关。这归结为遍历两个列表列表并获取每个组合之间的交集(您可以在函数开头添加该断言以确保它们是同一集合的分区,使用 this answer to flatten the lists efficiently). With this in mind, this function accomplishes the task, using this answer 计算列表交集:
def func2(x, y):
# check that they partition the same set
checkx = sorted([item for sublist in x for item in sublist])
checky = sorted([item for sublist in y for item in sublist])
assert checkx == checky
# get all intersections
all_s = []
for sx in x:
for sy in y:
ss = list(filter(lambda x:x in sx, sy))
if ss:
all_s.append(ss)
return all_s
然后使用 this time comparison method,我们可以看到这个新函数比您原来的实现快约 100 倍。
我可能会遗漏一些细节,但它似乎有点太简单了:
[np.intersect1d(a,b) for a in x for b in y]
输出:
[array([1]),
array([2]),
array([3]),
array([4, 5]),
array([6, 7]),
array([ 8, 9, 10])]
以上包括重复项,例如 x=[[1,2,3],[1,4,5]]
和 y=[[1,6,7]]
会给出 [[1],[1]]
。
如果你想找到独特的路口:
[list(i) for i in {tuple(np.intersect1d(a,b)) for a in x for b in y}]
输出:
[[8, 9, 10], [6, 7], [1], [4, 5], [2], [3]]
我需要找到同一组的两个分区之间的所有不同交集。例如,如果我们有以下两个相同集合的分区
x = [[1, 2], [3, 4, 5], [6, 7, 8, 9, 10]]
y = [[1, 3, 6, 7], [2, 4, 5, 8, 9, 10]]
要求的结果是
[[1], [2], [3], [4, 5], [6, 7], [8, 9, 10]].
详细地说,我们计算 x 和 y 的每个子集之间的笛卡尔积,并且对于这些乘积中的每一个,如果它们属于其相关子集的交集,我们将相应地对新子集中的元素进行分类。
什么是最佳的/更 pythonic 的方式来做到这一点?提前致谢!
当前答案的性能比较:
import numpy as np
def partitioning(alist, indices):
return [alist[i:j] for i, j in zip([0]+indices, indices+[None])]
total = 1000
sample1 = np.sort(np.random.choice(total, int(total/10), replace=False))
sample2 = np.sort(np.random.choice(total, int(total/2), replace=False))
a = partitioning(np.arange(total), list(sample1))
b = partitioning(np.arange(total), list(sample2))
def partition_decomposition_product_1(x, y):
out = []
for sublist1 in x:
d = {}
for val in sublist1:
for i, sublist2 in enumerate(y):
if val in sublist2:
d.setdefault(i, []).append(val)
out.extend(d.values())
return out
def partition_decomposition_product_2(x, y):
all_s = []
for sx in x:
for sy in y:
ss = list(filter(lambda x:x in sx, sy))
if ss:
all_s.append(ss)
return all_s
def partition_decomposition_product_3(x, y):
return [np.intersect1d(i,j) for i in x for j in y]
并使用 %timeit
测量执行时间%timeit partition_decomposition_product_1(a, b)
%timeit partition_decomposition_product_2(a, b)
%timeit partition_decomposition_product_3(a, b)
我们发现
2.16 s ± 246 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
620 ms ± 84.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.03 s ± 111 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
因此第二种解决方案是最快的。
我不确定我是否理解正确,但这个脚本产生了你在问题中得到的结果:
x = [[1, 2], [3, 4, 5], [6, 7, 8, 9, 10]]
y = [[1, 3, 6, 7], [2, 4, 5, 8, 9, 10]]
out = []
for sublist1 in x:
d = {}
for val in sublist1:
for i, sublist2 in enumerate(y):
if val in sublist2:
d.setdefault(i, []).append(val)
out.extend(d.values())
print(out)
打印:
[[1], [2], [3], [4, 5], [6, 7], [8, 9, 10]]
这两个列表是同一个集合的分区与算法选择无关。这归结为遍历两个列表列表并获取每个组合之间的交集(您可以在函数开头添加该断言以确保它们是同一集合的分区,使用 this answer to flatten the lists efficiently). With this in mind, this function accomplishes the task, using this answer 计算列表交集:
def func2(x, y):
# check that they partition the same set
checkx = sorted([item for sublist in x for item in sublist])
checky = sorted([item for sublist in y for item in sublist])
assert checkx == checky
# get all intersections
all_s = []
for sx in x:
for sy in y:
ss = list(filter(lambda x:x in sx, sy))
if ss:
all_s.append(ss)
return all_s
然后使用 this time comparison method,我们可以看到这个新函数比您原来的实现快约 100 倍。
我可能会遗漏一些细节,但它似乎有点太简单了:
[np.intersect1d(a,b) for a in x for b in y]
输出:
[array([1]),
array([2]),
array([3]),
array([4, 5]),
array([6, 7]),
array([ 8, 9, 10])]
以上包括重复项,例如 x=[[1,2,3],[1,4,5]]
和 y=[[1,6,7]]
会给出 [[1],[1]]
。
如果你想找到独特的路口:
[list(i) for i in {tuple(np.intersect1d(a,b)) for a in x for b in y}]
输出:
[[8, 9, 10], [6, 7], [1], [4, 5], [2], [3]]