根据是否已添加，在下一个子列表中添加新元素（还涉及字典问题）python

Question

Whosebug 社区：

我正在尝试创建一个子列表列表，其中包含一个基于对另一个列表的值进行随机抽样的循环；并且每个子列表都没有重复项或已添加到先前子列表的值的限制。

假设（示例）我有一个主列表：

[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]

#I get:
[[1,13],[4,1],[8,13]]

#I WANT:
[[1,13],[4,9],[8,14]]          #(no duplicates when checking previous sublists)

我认为可行的真实代码如下（作为草稿）：

matrixvals=list(matrix.index.values)  #list where values are obtained
lists=[[]for e in range(0,3)]         #list of sublists that I want to feed
vls=[]                                #stores the values that have been added to prevent adding them again
for e in lists:                       #initiate main loop
    for i in range(0,5):              #each sublist will contain 5 different random samples
        x=random.sample(matrixvals,1) #it doesn't matter if the samples are 1 or 2
        if any(x) not in vls:         #if the sample isn't in the evaluation list
            vls.extend(x)
            e.append(x)
        else:             #if it IS, then do a sample but without those already added values (line below)
            x=random.sample([matrixvals[:].remove(x) for x in vls],1)
            vls.extend(x)
            e.append(x)

        
print(lists)
print(vls)

它没有用，因为我得到以下信息：

[[[25], [16], [15], [31], [17]], [[4], [2], [13], [42], [13]], [[11], [7], [13], [17], [25]]]
[25, 16, 15, 31, 17, 4, 2, 13, 42, 13, 11, 7, 13, 17, 25]

如你所见，数字13重复了3次，我不明白为什么

我想要：

[[[25], [16], [15], [31], [17]], [[4], [2], [13], [42], [70]], [[11], [7], [100], [18], [27]]]
[25, 16, 15, 31, 17, 4, 2, 13, 42, 70, 11, 7, 100, 18, 27]   #no dups

此外，有没有办法将 sample.random 结果转换为值而不是列表？（获得）：

[[25,16,15,31,17]], [4, 2, 13, 42,70], [11, 7, 100, 18, 27]]

还有，现实中最终的结果不是一个子列表的列表，而是一个字典（上面的代码是尝试解决dict问题的草稿），有没有办法在一个听写？使用我目前的代码，我得到了下一个结果：

{'1stkey': {'1stsubkey': {'list1': [41,
    40,
    22,
    28,
    26,
    14,
    41,
    15,
    40,
    33],
   'list2': [41, 40, 22, 28, 26, 14, 41, 15, 40, 33],
   'list3': [41, 40, 22, 28, 26, 14, 41, 15, 40, 33]},
  '2ndsubkey': {'list1': [21,
    7,
    31,
    12,
    8,
    22,
    27,...}

我想要的不是那个结果：

 {'1stkey': {'1stsubkey': {'list1': [41,40,22],
       'list2': [28, 26, 14],
       'list3': [41, 15, 40, 33]},
      '2ndsubkey': {'list1': [21,7,31],
       'list2':[12,8,22],
       'list3':[27...,...}#and so on

有没有办法同时解决list和dict的问题？任何帮助将不胜感激；即使只是列表问题我也能取得一些进展

感谢大家

Answer 1

我知道您可能更想知道为什么您的特定方法不起作用。但是，如果我了解您想要的行为，我或许可以提供替代解决方案。发布我的答案后，我会看看你的尝试。

random.sample 允许您从 population（集合、列表等）中抽取 k 个项目。如果集合中没有重复的元素，那么您保证在您的随机样本中没有重复：

from random import sample

pool = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

num_samples = 4

print(sample(pool, k=num_samples))

可能的输出：

[9, 11, 8, 7]
>>>

无论您运行这个片段多少次，您的随机样本中永远不会有重复的元素。这是因为 random.sample 不会生成随机对象，它只是随机选择集合中已经存在的项目。例如，这与从一副纸牌中随机抽取纸牌或抽取彩票号码时所采用的方法相同。

在您的例子中，pool 是可供您从中选择样本的唯一编号的集合。您想要的输出似乎是三个列表的列表，其中每个子列表中有两个样本。我们应该用 k=num_sublists * num_samples_per_sublist:

调用一次 random.sample 而不是为每个子列表调用一次

from random import sample

pool = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

num_sublists = 3
samples_per_sublist = 2

num_samples = num_sublists * samples_per_sublist

assert num_samples <= len(pool)

print(sample(pool, k=num_samples))

可能的输出：

[14, 10, 1, 8, 6, 3]
>>>

好的，所以我们有六个样本而不是四个。还没有子列表。现在您可以简单地将这个包含六个样本的列表分成三个子列表，每个子列表包含两个样本：

from random import sample

pool = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

num_sublists = 3
samples_per_sublist = 2

num_samples = num_sublists * samples_per_sublist

assert num_samples <= len(pool)

def pairwise(iterable):
    yield from zip(*[iter(iterable)]*samples_per_sublist)

print(list(pairwise(sample(pool, num_samples))))

可能的输出：

[(4, 11), (12, 13), (8, 15)]
>>>

或者如果你真的想要子列表，而不是元组：

def pairwise(iterable):
    yield from map(list, zip(*[iter(iterable)]*samples_per_sublist))

编辑 - 刚刚意识到您实际上不需要列表列表，而是字典。更像这样的东西？抱歉，我对生成器很着迷，这不是很容易阅读：

keys = ["1stkey"]
subkeys = ["1stsubkey", "2ndsubkey"]
num_lists_per_subkey = 3
num_samples_per_list = 5
num_samples = num_lists_per_subkey * num_samples_per_list

min_sample = 1
max_sample = 50

pool = list(range(min_sample, max_sample + 1))

def generate_items():

    def generate_sub_items():
        from random import sample

        samples = sample(pool, k=num_samples)

        def generate_sub_sub_items():

            def chunkwise(iterable, n=num_samples_per_list):
                yield from map(list, zip(*[iter(iterable)]*n))
        
            for list_num, chunk in enumerate(chunkwise(samples), start=1):
                key = f"list{list_num}"
                yield key, chunk

        for subkey in subkeys:
            yield subkey, dict(generate_sub_sub_items())
    
    for key in keys:
        yield key, dict(generate_sub_items())

print(dict(generate_items()))

可能的输出：

{'1stkey': {'1stsubkey': {'list1': [43, 20, 4, 27, 2], 'list2': [49, 44, 18, 8, 37], 'list3': [19, 40, 9, 17, 6]}, '2ndsubkey': {'list1': [43, 20, 4, 27, 2], 'list2': [49, 44, 18, 8, 37], 'list3': [19, 40, 9, 17, 6]}}}
>>>

根据是否已添加，在下一个子列表中添加新元素（还涉及字典问题）python

Add new element in the next sublist depending in if it has been added or not (involves also a dictionary problem) python

python

dictionary

for-loop

nested-lists