使用特定条件的列表中项目的组合
Combinations of items in a list using specific critera
我正在尝试查找列表中项目的特定组合。该列表由重复 y 次的 x 组组成。在此示例中,x 和 y = 3,但实际上大小可能大得多。我想找到组和 y 的每个组合,但不复制给定组合的 x 值。我认为只展示一个我想要的例子会更容易。
这是一个例子。
A = ['ST1_0.245', 'ST1_0.29', 'ST1_0.335', 'ST2_0.245', 'ST2_0.29', 'ST2_0.335', 'ST3_0.245', 'ST3_0.29', 'ST3_0.335']
所以三组,ST1、ST2 和 ST3 – 每组有 3 次迭代,0.245、0.290 和 0.335。
我想找到以下组合。
('ST1_0.245', 'ST2_0.245', 'ST3_0.245')
('ST1_0.245', 'ST2_0.245', 'ST3_0.29')
('ST1_0.245', 'ST2_0.245', 'ST3_0.335')
('ST1_0.245', 'ST2_0.29', 'ST3_0.245')
('ST1_0.245', 'ST2_0.29', 'ST3_0.29')
('ST1_0.245', 'ST2_0.29', 'ST3_0.335')
('ST1_0.245', 'ST2_0.335', 'ST3_0.245')
('ST1_0.245', 'ST2_0.335', 'ST3_0.29')
('ST1_0.245', 'ST2_0.335', 'ST3_0.335')
('ST1_0.29', 'ST2_0.245', 'ST3_0.245')
('ST1_0.29', 'ST2_0.245', 'ST3_0.29')
('ST1_0.29', 'ST2_0.245', 'ST3_0.335')
('ST1_0.29', 'ST2_0.29', 'ST3_0.245')
('ST1_0.29', 'ST2_0.29', 'ST3_0.29')
('ST1_0.29', 'ST2_0.29', 'ST3_0.335')
('ST1_0.29', 'ST2_0.335', 'ST3_0.245')
('ST1_0.29', 'ST2_0.335', 'ST3_0.29')
('ST1_0.29', 'ST2_0.335', 'ST3_0.335')
('ST1_0.335', 'ST2_0.245', 'ST3_0.245')
('ST1_0.335', 'ST2_0.245', 'ST3_0.29')
('ST1_0.335', 'ST2_0.245', 'ST3_0.335')
('ST1_0.335', 'ST2_0.29', 'ST3_0.245')
('ST1_0.335', 'ST2_0.29', 'ST3_0.29')
('ST1_0.335', 'ST2_0.29', 'ST3_0.335')
('ST1_0.335', 'ST2_0.335', 'ST3_0.245')
('ST1_0.335', 'ST2_0.335', 'ST3_0.29')
('ST1_0.335', 'ST2_0.335', 'ST3_0.335')
请注意,ST1、ST2 和 ST3 在每个组合中只出现一次。
这是我至少要为小案例工作的代码。
import itertools
import numpy as np
comb = []
gr_list=['ST1','ST2','ST3']
for itr in itertools.combinations(A, len(gr_list)):
# pdb.set_trace()
for n in np.arange(len(gr_list)):
if sum(itr[n].split('_')[0] in s for s in itr) > 1:
break
if n == len(gr_list)-1:
comb.append(itr)
这适用于我测试的几个小示例,但是当我尝试更大的值时,我得到的结果比我想象的要多,但这可能是我在尝试计算预期数量时的错误。但无论哪种方式,都需要太长时间。有更快的方法吗?
我确实分别拥有这两个值。当我写这篇文章时,我觉得这是一种更好的方法,但我也不确定该怎么做。
您可以为此使用 itertools.product
,这将生成一个迭代器而不是一个列表(如果您正在迭代而不是生成整个集合,这通常会更有效)。您最终会得到不同类别长度的乘积作为迭代器中元素的数量。
根据需要创建组,然后在组上使用 itertools.product
:
A = ['ST1_0.245', 'ST1_0.29', 'ST1_0.335',
'ST2_0.245', 'ST2_0.29', 'ST2_0.335',
'ST3_0.245', 'ST3_0.29', 'ST3_0.335']
prefixes = set(s.split("_")[0] for s in A)
groups = [[a for a in A if a.split("_")[0]==p] for p in prefixes]
>>> list(itertools.product(*groups))
[('ST2_0.245', 'ST3_0.245', 'ST1_0.245'),
('ST2_0.245', 'ST3_0.245', 'ST1_0.29'),
('ST2_0.245', 'ST3_0.245', 'ST1_0.335'),
('ST2_0.245', 'ST3_0.29', 'ST1_0.245'),
('ST2_0.245', 'ST3_0.29', 'ST1_0.29'),
('ST2_0.245', 'ST3_0.29', 'ST1_0.335'),
('ST2_0.245', 'ST3_0.335', 'ST1_0.245'),
('ST2_0.245', 'ST3_0.335', 'ST1_0.29'),
('ST2_0.245', 'ST3_0.335', 'ST1_0.335'),
('ST2_0.29', 'ST3_0.245', 'ST1_0.245'),
('ST2_0.29', 'ST3_0.245', 'ST1_0.29'),
('ST2_0.29', 'ST3_0.245', 'ST1_0.335'),
('ST2_0.29', 'ST3_0.29', 'ST1_0.245'),
('ST2_0.29', 'ST3_0.29', 'ST1_0.29'),
('ST2_0.29', 'ST3_0.29', 'ST1_0.335'),
('ST2_0.29', 'ST3_0.335', 'ST1_0.245'),
('ST2_0.29', 'ST3_0.335', 'ST1_0.29'),
('ST2_0.29', 'ST3_0.335', 'ST1_0.335'),
('ST2_0.335', 'ST3_0.245', 'ST1_0.245'),
('ST2_0.335', 'ST3_0.245', 'ST1_0.29'),
('ST2_0.335', 'ST3_0.245', 'ST1_0.335'),
('ST2_0.335', 'ST3_0.29', 'ST1_0.245'),
('ST2_0.335', 'ST3_0.29', 'ST1_0.29'),
('ST2_0.335', 'ST3_0.29', 'ST1_0.335'),
('ST2_0.335', 'ST3_0.335', 'ST1_0.245'),
('ST2_0.335', 'ST3_0.335', 'ST1_0.29'),
('ST2_0.335', 'ST3_0.335', 'ST1_0.335')]
我正在尝试查找列表中项目的特定组合。该列表由重复 y 次的 x 组组成。在此示例中,x 和 y = 3,但实际上大小可能大得多。我想找到组和 y 的每个组合,但不复制给定组合的 x 值。我认为只展示一个我想要的例子会更容易。
这是一个例子。
A = ['ST1_0.245', 'ST1_0.29', 'ST1_0.335', 'ST2_0.245', 'ST2_0.29', 'ST2_0.335', 'ST3_0.245', 'ST3_0.29', 'ST3_0.335']
所以三组,ST1、ST2 和 ST3 – 每组有 3 次迭代,0.245、0.290 和 0.335。
我想找到以下组合。
('ST1_0.245', 'ST2_0.245', 'ST3_0.245')
('ST1_0.245', 'ST2_0.245', 'ST3_0.29')
('ST1_0.245', 'ST2_0.245', 'ST3_0.335')
('ST1_0.245', 'ST2_0.29', 'ST3_0.245')
('ST1_0.245', 'ST2_0.29', 'ST3_0.29')
('ST1_0.245', 'ST2_0.29', 'ST3_0.335')
('ST1_0.245', 'ST2_0.335', 'ST3_0.245')
('ST1_0.245', 'ST2_0.335', 'ST3_0.29')
('ST1_0.245', 'ST2_0.335', 'ST3_0.335')
('ST1_0.29', 'ST2_0.245', 'ST3_0.245')
('ST1_0.29', 'ST2_0.245', 'ST3_0.29')
('ST1_0.29', 'ST2_0.245', 'ST3_0.335')
('ST1_0.29', 'ST2_0.29', 'ST3_0.245')
('ST1_0.29', 'ST2_0.29', 'ST3_0.29')
('ST1_0.29', 'ST2_0.29', 'ST3_0.335')
('ST1_0.29', 'ST2_0.335', 'ST3_0.245')
('ST1_0.29', 'ST2_0.335', 'ST3_0.29')
('ST1_0.29', 'ST2_0.335', 'ST3_0.335')
('ST1_0.335', 'ST2_0.245', 'ST3_0.245')
('ST1_0.335', 'ST2_0.245', 'ST3_0.29')
('ST1_0.335', 'ST2_0.245', 'ST3_0.335')
('ST1_0.335', 'ST2_0.29', 'ST3_0.245')
('ST1_0.335', 'ST2_0.29', 'ST3_0.29')
('ST1_0.335', 'ST2_0.29', 'ST3_0.335')
('ST1_0.335', 'ST2_0.335', 'ST3_0.245')
('ST1_0.335', 'ST2_0.335', 'ST3_0.29')
('ST1_0.335', 'ST2_0.335', 'ST3_0.335')
请注意,ST1、ST2 和 ST3 在每个组合中只出现一次。
这是我至少要为小案例工作的代码。
import itertools
import numpy as np
comb = []
gr_list=['ST1','ST2','ST3']
for itr in itertools.combinations(A, len(gr_list)):
# pdb.set_trace()
for n in np.arange(len(gr_list)):
if sum(itr[n].split('_')[0] in s for s in itr) > 1:
break
if n == len(gr_list)-1:
comb.append(itr)
这适用于我测试的几个小示例,但是当我尝试更大的值时,我得到的结果比我想象的要多,但这可能是我在尝试计算预期数量时的错误。但无论哪种方式,都需要太长时间。有更快的方法吗?
我确实分别拥有这两个值。当我写这篇文章时,我觉得这是一种更好的方法,但我也不确定该怎么做。
您可以为此使用 itertools.product
,这将生成一个迭代器而不是一个列表(如果您正在迭代而不是生成整个集合,这通常会更有效)。您最终会得到不同类别长度的乘积作为迭代器中元素的数量。
根据需要创建组,然后在组上使用 itertools.product
:
A = ['ST1_0.245', 'ST1_0.29', 'ST1_0.335',
'ST2_0.245', 'ST2_0.29', 'ST2_0.335',
'ST3_0.245', 'ST3_0.29', 'ST3_0.335']
prefixes = set(s.split("_")[0] for s in A)
groups = [[a for a in A if a.split("_")[0]==p] for p in prefixes]
>>> list(itertools.product(*groups))
[('ST2_0.245', 'ST3_0.245', 'ST1_0.245'),
('ST2_0.245', 'ST3_0.245', 'ST1_0.29'),
('ST2_0.245', 'ST3_0.245', 'ST1_0.335'),
('ST2_0.245', 'ST3_0.29', 'ST1_0.245'),
('ST2_0.245', 'ST3_0.29', 'ST1_0.29'),
('ST2_0.245', 'ST3_0.29', 'ST1_0.335'),
('ST2_0.245', 'ST3_0.335', 'ST1_0.245'),
('ST2_0.245', 'ST3_0.335', 'ST1_0.29'),
('ST2_0.245', 'ST3_0.335', 'ST1_0.335'),
('ST2_0.29', 'ST3_0.245', 'ST1_0.245'),
('ST2_0.29', 'ST3_0.245', 'ST1_0.29'),
('ST2_0.29', 'ST3_0.245', 'ST1_0.335'),
('ST2_0.29', 'ST3_0.29', 'ST1_0.245'),
('ST2_0.29', 'ST3_0.29', 'ST1_0.29'),
('ST2_0.29', 'ST3_0.29', 'ST1_0.335'),
('ST2_0.29', 'ST3_0.335', 'ST1_0.245'),
('ST2_0.29', 'ST3_0.335', 'ST1_0.29'),
('ST2_0.29', 'ST3_0.335', 'ST1_0.335'),
('ST2_0.335', 'ST3_0.245', 'ST1_0.245'),
('ST2_0.335', 'ST3_0.245', 'ST1_0.29'),
('ST2_0.335', 'ST3_0.245', 'ST1_0.335'),
('ST2_0.335', 'ST3_0.29', 'ST1_0.245'),
('ST2_0.335', 'ST3_0.29', 'ST1_0.29'),
('ST2_0.335', 'ST3_0.29', 'ST1_0.335'),
('ST2_0.335', 'ST3_0.335', 'ST1_0.245'),
('ST2_0.335', 'ST3_0.335', 'ST1_0.29'),
('ST2_0.335', 'ST3_0.335', 'ST1_0.335')]