如何对字典列表进行分组以获取其相应索引的列表?

how to group a list of dictionaries to get a list of their corresponding indices?

有一个字典列表,像这样:

l = [{'a':25}, {'a':25}, {'b':30}, {'c':200}, {'b':30}]  

想要找到不同的元素及其对应的索引,像这样:

[
({'a':25}, [0,1]),
({'b':30}, [2,4]),
({'c':200}, [3]),
]

尝试了 itertools.groupby,但无法实现,也许我遗漏了什么,任何其他方向也很好。

将字典转换为元组,这样您就可以将它们用作字典中的键。然后遍历列表,将索引添加到这个字典中。

locations_dict = {}
for i, d in enumerate(l):
    dtuple = tuple(d.items())
    locations_dict.setdefault(dtuple, []).append(i)

locations = [(dict(key), value) for key, value in locations_dict.items()]
from collections import defaultdict

indices = defaultdict(list)
for idx, val in enumerate(l):
    indices[tuple(*val.items())].append(idx)

print(indices)

# output
defaultdict(list, {('a', 25): [0, 1], ('b', 30): [2, 4], ('c', 200): [3]})

考虑这个字典列表:

>>> dicts
[{'a': 3},
 {'d': 4, 'a': 3, 'c': 1},
 {'d': 8, 'c': 0, 'b': 9},
 {'c': 3, 'a': 9},
 {'a': 5, 'd': 8},
 {'d': 5, 'b': 5, 'a': 0},
 {'b': 7, 'c': 7},
 {'d': 6, 'b': 7, 'a': 6},
 {'a': 4, 'c': 1, 'd': 5, 'b': 2},
 {'d': 7}]

假设您想要每个字典键的每个实例的所有索引:

idxs = {}
for i, d in enumerate(l):
    for pair in d.items():
        idxs.setdefault(pair, []).append(i)

这会产生我认为更有用的输出,因为它允许您查找任何特定键值对的索引:

{('a', 3): [0, 1],
 ('d', 4): [1],
 ('c', 1): [1, 8],
 ('d', 8): [2, 4],
 ('c', 0): [2],
 ('b', 9): [2],
 ('c', 3): [3],
 ('a', 9): [3],
 ('a', 5): [4],
 ('d', 5): [5, 8],
 ('b', 5): [5],
 ('a', 0): [5],
 ('b', 7): [6, 7],
 ('c', 7): [6],
 ('d', 6): [7],
 ('a', 6): [7],
 ('a', 4): [8],
 ('b', 2): [8],
 ('d', 7): [9]}

但是,如果您必须转换为 List[Tuple[Dict[str, int], List[int]]],您可以很容易地从之前的输出生成它:

>>> [(dict((p,)), l) for p, l in idxs.items()]
[({'a': 3}, [0, 1]),
 ({'d': 4}, [1]),
 ({'c': 1}, [1, 8]),
 ({'d': 8}, [2, 4]),
 ({'c': 0}, [2]),
 ({'b': 9}, [2]),
 ({'c': 3}, [3]),
 ({'a': 9}, [3]),
 ({'a': 5}, [4]),
 ({'d': 5}, [5, 8]),
 ({'b': 5}, [5]),
 ({'a': 0}, [5]),
 ({'b': 7}, [6, 7]),
 ({'c': 7}, [6]),
 ({'d': 6}, [7]),
 ({'a': 6}, [7]),
 ({'a': 4}, [8]),
 ({'b': 2}, [8]),
 ({'d': 7}, [9])]

dicts/defaultdicts 的好主意,这似乎也行得通:

l = [{'a':25}, {'a':25}, {'b':30}, {'c':200}, {'b':30}, {'a': 25}]
sorted_values = sorted(enumerate(l), key=lambda x: str(x[1]))
grouped = itertools.groupby(sorted_values, lambda x: x[1])
grouped_indices = [(k, [x[0] for x in g]) for k, g in grouped]
print(grouped_indices)

这个想法是,一旦数组被排序(保留原始索引作为附加细节)itertools/linux groupby 与 sql/pandas groupby

相似

另一种方法:

import ast
l = [{'a':25}, {'a':25}, {'b':30}, {'c':200}, {'b':30}]
n_dict = {}
for a, b in enumerate(l):
    n_dict[str(b)] = n_dict.get(str(b), []) + [a]

print(list(zip( [ast.literal_eval(i) for i in n_dict.keys()], n_dict.values() )))