使用特定条件的列表中项目的组合

Combinations of items in a list using specific critera

我正在尝试查找列表中项目的特定组合。该列表由重复 y 次的 x 组组成。在此示例中,x 和 y = 3,但实际上大小可能大得多。我想找到组和 y 的每个组合,但不复制给定组合的 x 值。我认为只展示一个我想要的例子会更容易。

这是一个例子。

A = ['ST1_0.245', 'ST1_0.29', 'ST1_0.335', 'ST2_0.245', 'ST2_0.29', 'ST2_0.335', 'ST3_0.245', 'ST3_0.29', 'ST3_0.335']

所以三组,ST1、ST2 和 ST3 – 每组有 3 次迭代,0.245、0.290 和 0.335。

我想找到以下组合。

('ST1_0.245', 'ST2_0.245', 'ST3_0.245')
('ST1_0.245', 'ST2_0.245', 'ST3_0.29')
('ST1_0.245', 'ST2_0.245', 'ST3_0.335')
('ST1_0.245', 'ST2_0.29', 'ST3_0.245')
('ST1_0.245', 'ST2_0.29', 'ST3_0.29')
('ST1_0.245', 'ST2_0.29', 'ST3_0.335')
('ST1_0.245', 'ST2_0.335', 'ST3_0.245')
('ST1_0.245', 'ST2_0.335', 'ST3_0.29')
('ST1_0.245', 'ST2_0.335', 'ST3_0.335')
('ST1_0.29', 'ST2_0.245', 'ST3_0.245')
('ST1_0.29', 'ST2_0.245', 'ST3_0.29')
('ST1_0.29', 'ST2_0.245', 'ST3_0.335')
('ST1_0.29', 'ST2_0.29', 'ST3_0.245')
('ST1_0.29', 'ST2_0.29', 'ST3_0.29')
('ST1_0.29', 'ST2_0.29', 'ST3_0.335')
('ST1_0.29', 'ST2_0.335', 'ST3_0.245')
('ST1_0.29', 'ST2_0.335', 'ST3_0.29')
('ST1_0.29', 'ST2_0.335', 'ST3_0.335')
('ST1_0.335', 'ST2_0.245', 'ST3_0.245')
('ST1_0.335', 'ST2_0.245', 'ST3_0.29')
('ST1_0.335', 'ST2_0.245', 'ST3_0.335')
('ST1_0.335', 'ST2_0.29', 'ST3_0.245')
('ST1_0.335', 'ST2_0.29', 'ST3_0.29')
('ST1_0.335', 'ST2_0.29', 'ST3_0.335')
('ST1_0.335', 'ST2_0.335', 'ST3_0.245')
('ST1_0.335', 'ST2_0.335', 'ST3_0.29')
('ST1_0.335', 'ST2_0.335', 'ST3_0.335')

请注意,ST1、ST2 和 ST3 在每个组合中只出现一次。

这是我至少要为小案例工作的代码。

import itertools
import numpy as np

comb = []
gr_list=['ST1','ST2','ST3']
for itr in itertools.combinations(A, len(gr_list)):
    # pdb.set_trace()
    for n in np.arange(len(gr_list)):
        if sum(itr[n].split('_')[0] in s for s in itr) > 1:
            break
    
    if n == len(gr_list)-1:
        comb.append(itr)

这适用于我测试的几个小示例,但是当我尝试更大的值时,我得到的结果比我想象的要多,但这可能是我在尝试计算预期数量时的错误。但无论哪种方式,都需要太长时间。有更快的方法吗?

我确实分别拥有这两个值。当我写这篇文章时,我觉得这是一种更好的方法,但我也不确定该怎么做。

您可以为此使用 itertools.product,这将生成一个迭代器而不是一个列表(如果您正在迭代而不是生成整个集合,这通常会更有效)。您最终会得到不同类别长度的乘积作为迭代器中元素的数量。

根据需要创建组,然后在组上使用 itertools.product

A = ['ST1_0.245', 'ST1_0.29', 'ST1_0.335', 
     'ST2_0.245', 'ST2_0.29', 'ST2_0.335', 
     'ST3_0.245', 'ST3_0.29', 'ST3_0.335']

prefixes = set(s.split("_")[0] for s in A)
groups = [[a for a in A if a.split("_")[0]==p] for p in prefixes]

>>> list(itertools.product(*groups))

[('ST2_0.245', 'ST3_0.245', 'ST1_0.245'),
 ('ST2_0.245', 'ST3_0.245', 'ST1_0.29'),
 ('ST2_0.245', 'ST3_0.245', 'ST1_0.335'),
 ('ST2_0.245', 'ST3_0.29', 'ST1_0.245'),
 ('ST2_0.245', 'ST3_0.29', 'ST1_0.29'),
 ('ST2_0.245', 'ST3_0.29', 'ST1_0.335'),
 ('ST2_0.245', 'ST3_0.335', 'ST1_0.245'),
 ('ST2_0.245', 'ST3_0.335', 'ST1_0.29'),
 ('ST2_0.245', 'ST3_0.335', 'ST1_0.335'),
 ('ST2_0.29', 'ST3_0.245', 'ST1_0.245'),
 ('ST2_0.29', 'ST3_0.245', 'ST1_0.29'),
 ('ST2_0.29', 'ST3_0.245', 'ST1_0.335'),
 ('ST2_0.29', 'ST3_0.29', 'ST1_0.245'),
 ('ST2_0.29', 'ST3_0.29', 'ST1_0.29'),
 ('ST2_0.29', 'ST3_0.29', 'ST1_0.335'),
 ('ST2_0.29', 'ST3_0.335', 'ST1_0.245'),
 ('ST2_0.29', 'ST3_0.335', 'ST1_0.29'),
 ('ST2_0.29', 'ST3_0.335', 'ST1_0.335'),
 ('ST2_0.335', 'ST3_0.245', 'ST1_0.245'),
 ('ST2_0.335', 'ST3_0.245', 'ST1_0.29'),
 ('ST2_0.335', 'ST3_0.245', 'ST1_0.335'),
 ('ST2_0.335', 'ST3_0.29', 'ST1_0.245'),
 ('ST2_0.335', 'ST3_0.29', 'ST1_0.29'),
 ('ST2_0.335', 'ST3_0.29', 'ST1_0.335'),
 ('ST2_0.335', 'ST3_0.335', 'ST1_0.245'),
 ('ST2_0.335', 'ST3_0.335', 'ST1_0.29'),
 ('ST2_0.335', 'ST3_0.335', 'ST1_0.335')]