如何对字典列表进行分组以获取其相应索引的列表?
how to group a list of dictionaries to get a list of their corresponding indices?
有一个字典列表,像这样:
l = [{'a':25}, {'a':25}, {'b':30}, {'c':200}, {'b':30}]
想要找到不同的元素及其对应的索引,像这样:
[
({'a':25}, [0,1]),
({'b':30}, [2,4]),
({'c':200}, [3]),
]
尝试了 itertools.groupby,但无法实现,也许我遗漏了什么,任何其他方向也很好。
将字典转换为元组,这样您就可以将它们用作字典中的键。然后遍历列表,将索引添加到这个字典中。
locations_dict = {}
for i, d in enumerate(l):
dtuple = tuple(d.items())
locations_dict.setdefault(dtuple, []).append(i)
locations = [(dict(key), value) for key, value in locations_dict.items()]
from collections import defaultdict
indices = defaultdict(list)
for idx, val in enumerate(l):
indices[tuple(*val.items())].append(idx)
print(indices)
# output
defaultdict(list, {('a', 25): [0, 1], ('b', 30): [2, 4], ('c', 200): [3]})
考虑这个字典列表:
>>> dicts
[{'a': 3},
{'d': 4, 'a': 3, 'c': 1},
{'d': 8, 'c': 0, 'b': 9},
{'c': 3, 'a': 9},
{'a': 5, 'd': 8},
{'d': 5, 'b': 5, 'a': 0},
{'b': 7, 'c': 7},
{'d': 6, 'b': 7, 'a': 6},
{'a': 4, 'c': 1, 'd': 5, 'b': 2},
{'d': 7}]
假设您想要每个字典键的每个实例的所有索引:
idxs = {}
for i, d in enumerate(l):
for pair in d.items():
idxs.setdefault(pair, []).append(i)
这会产生我认为更有用的输出,因为它允许您查找任何特定键值对的索引:
{('a', 3): [0, 1],
('d', 4): [1],
('c', 1): [1, 8],
('d', 8): [2, 4],
('c', 0): [2],
('b', 9): [2],
('c', 3): [3],
('a', 9): [3],
('a', 5): [4],
('d', 5): [5, 8],
('b', 5): [5],
('a', 0): [5],
('b', 7): [6, 7],
('c', 7): [6],
('d', 6): [7],
('a', 6): [7],
('a', 4): [8],
('b', 2): [8],
('d', 7): [9]}
但是,如果您必须转换为 List[Tuple[Dict[str, int], List[int]]]
,您可以很容易地从之前的输出生成它:
>>> [(dict((p,)), l) for p, l in idxs.items()]
[({'a': 3}, [0, 1]),
({'d': 4}, [1]),
({'c': 1}, [1, 8]),
({'d': 8}, [2, 4]),
({'c': 0}, [2]),
({'b': 9}, [2]),
({'c': 3}, [3]),
({'a': 9}, [3]),
({'a': 5}, [4]),
({'d': 5}, [5, 8]),
({'b': 5}, [5]),
({'a': 0}, [5]),
({'b': 7}, [6, 7]),
({'c': 7}, [6]),
({'d': 6}, [7]),
({'a': 6}, [7]),
({'a': 4}, [8]),
({'b': 2}, [8]),
({'d': 7}, [9])]
dicts/defaultdicts 的好主意,这似乎也行得通:
l = [{'a':25}, {'a':25}, {'b':30}, {'c':200}, {'b':30}, {'a': 25}]
sorted_values = sorted(enumerate(l), key=lambda x: str(x[1]))
grouped = itertools.groupby(sorted_values, lambda x: x[1])
grouped_indices = [(k, [x[0] for x in g]) for k, g in grouped]
print(grouped_indices)
这个想法是,一旦数组被排序(保留原始索引作为附加细节)itertools/linux groupby 与 sql/pandas groupby
相似
另一种方法:
import ast
l = [{'a':25}, {'a':25}, {'b':30}, {'c':200}, {'b':30}]
n_dict = {}
for a, b in enumerate(l):
n_dict[str(b)] = n_dict.get(str(b), []) + [a]
print(list(zip( [ast.literal_eval(i) for i in n_dict.keys()], n_dict.values() )))
有一个字典列表,像这样:
l = [{'a':25}, {'a':25}, {'b':30}, {'c':200}, {'b':30}]
想要找到不同的元素及其对应的索引,像这样:
[
({'a':25}, [0,1]),
({'b':30}, [2,4]),
({'c':200}, [3]),
]
尝试了 itertools.groupby,但无法实现,也许我遗漏了什么,任何其他方向也很好。
将字典转换为元组,这样您就可以将它们用作字典中的键。然后遍历列表,将索引添加到这个字典中。
locations_dict = {}
for i, d in enumerate(l):
dtuple = tuple(d.items())
locations_dict.setdefault(dtuple, []).append(i)
locations = [(dict(key), value) for key, value in locations_dict.items()]
from collections import defaultdict
indices = defaultdict(list)
for idx, val in enumerate(l):
indices[tuple(*val.items())].append(idx)
print(indices)
# output
defaultdict(list, {('a', 25): [0, 1], ('b', 30): [2, 4], ('c', 200): [3]})
考虑这个字典列表:
>>> dicts
[{'a': 3},
{'d': 4, 'a': 3, 'c': 1},
{'d': 8, 'c': 0, 'b': 9},
{'c': 3, 'a': 9},
{'a': 5, 'd': 8},
{'d': 5, 'b': 5, 'a': 0},
{'b': 7, 'c': 7},
{'d': 6, 'b': 7, 'a': 6},
{'a': 4, 'c': 1, 'd': 5, 'b': 2},
{'d': 7}]
假设您想要每个字典键的每个实例的所有索引:
idxs = {}
for i, d in enumerate(l):
for pair in d.items():
idxs.setdefault(pair, []).append(i)
这会产生我认为更有用的输出,因为它允许您查找任何特定键值对的索引:
{('a', 3): [0, 1],
('d', 4): [1],
('c', 1): [1, 8],
('d', 8): [2, 4],
('c', 0): [2],
('b', 9): [2],
('c', 3): [3],
('a', 9): [3],
('a', 5): [4],
('d', 5): [5, 8],
('b', 5): [5],
('a', 0): [5],
('b', 7): [6, 7],
('c', 7): [6],
('d', 6): [7],
('a', 6): [7],
('a', 4): [8],
('b', 2): [8],
('d', 7): [9]}
但是,如果您必须转换为 List[Tuple[Dict[str, int], List[int]]]
,您可以很容易地从之前的输出生成它:
>>> [(dict((p,)), l) for p, l in idxs.items()]
[({'a': 3}, [0, 1]),
({'d': 4}, [1]),
({'c': 1}, [1, 8]),
({'d': 8}, [2, 4]),
({'c': 0}, [2]),
({'b': 9}, [2]),
({'c': 3}, [3]),
({'a': 9}, [3]),
({'a': 5}, [4]),
({'d': 5}, [5, 8]),
({'b': 5}, [5]),
({'a': 0}, [5]),
({'b': 7}, [6, 7]),
({'c': 7}, [6]),
({'d': 6}, [7]),
({'a': 6}, [7]),
({'a': 4}, [8]),
({'b': 2}, [8]),
({'d': 7}, [9])]
dicts/defaultdicts 的好主意,这似乎也行得通:
l = [{'a':25}, {'a':25}, {'b':30}, {'c':200}, {'b':30}, {'a': 25}]
sorted_values = sorted(enumerate(l), key=lambda x: str(x[1]))
grouped = itertools.groupby(sorted_values, lambda x: x[1])
grouped_indices = [(k, [x[0] for x in g]) for k, g in grouped]
print(grouped_indices)
这个想法是,一旦数组被排序(保留原始索引作为附加细节)itertools/linux groupby 与 sql/pandas groupby
相似另一种方法:
import ast
l = [{'a':25}, {'a':25}, {'b':30}, {'c':200}, {'b':30}]
n_dict = {}
for a, b in enumerate(l):
n_dict[str(b)] = n_dict.get(str(b), []) + [a]
print(list(zip( [ast.literal_eval(i) for i in n_dict.keys()], n_dict.values() )))