来自列表列表中找到的关联的字典(性能问题)
Dictionary from associations found in list of lists (performance issue)
对象:生成一个字典,显示 base[2:]
中的哪些唯一值(在列表 uniques
中捕获)与之关联
base[1]
值(即 5001、5002 等)。
下面的代码有效,但对于我的数据量来说太慢了
需要处理,所以我正在寻找一种更快的方法来实现这一点。
base = [['a', 5001, 1, 4, 8],
['b', 5002, 2, 5],
['c', 5002, 2, 5],
['d', 5003, 2, 6, 7],
['e', 5004, 3, 6, 9]]
uniques = [1,2,3,4,5,6,7,8,9]
uniques_dict = {}
for item in uniques:
uniques_dict[item] = list(set([records[1] for records in base if item in records[2:]]))
print(uniques_dict)
Output:
{ 1: [5001], 2: [5002, 5003], 3: [5004],
4: [5001], 5: [5002], 6: [5003, 5004],
7: [5003], 8: [5001], 9: [5004] }
与其一遍又一遍地遍历所有 records
,不如反转循环。将 uniques
设置为 集 以进行快速成员资格测试,并循环 records
一次。
更好的是,该集合可以由字典键处理:
uniques_dict = {u: [] for u in uniques}
for record in base:
key, values = record[1], record[2:]
for unique in uniques_dict.keys() & values: # the intersection
uniques_dict[unique].append(key)
在 Python 3 中,dict.keys()
是完全相同行为的 dictionary view object which acts like a set. You can create an intersection with that set with the &
and operator. If you are using Python 2, replace uniques_dict.keys()
with uniques_dict.viewkeys()
。
集路口快速高效;您仍然需要将 record[2:]
中的每个元素与键集进行匹配,但它是 O(N) 循环而不是 O(NK) 循环,因为每个键测试都是独立于 K = 的 O(1) 操作len(unique_keys)
.
演示:
>>> base = [['a', 5001, 1, 4, 8],
... ['b', 5002, 2, 5],
... ['c', 5002, 2, 5],
... ['d', 5003, 2, 6, 7],
... ['e', 5004, 3, 6, 9]]
>>> uniques = [1,2,3,4,5,6,7,8,9]
>>> uniques_dict = {u: [] for u in uniques}
>>> for record in base:
... key, values = record[1], record[2:]
... for unique in uniques_dict.keys() & values: # the intersection
... uniques_dict[unique].append(key)
...
>>> uniques_dict
{1: [5001], 2: [5002, 5002, 5003], 3: [5004], 4: [5001], 5: [5002, 5002], 6: [5003, 5004], 7: [5003], 8: [5001], 9: [5004]}
如果 uniques
是 base[*][2:]
中所有可能值的严格超集,那么您甚至不必预先计算这些值。只需在进行时创建字典键,并在每个 record[2:]
列表上使用 set()
以仅处理唯一值。还应设置 uniques_dict
值以消除添加的重复键:
uniques_dict = {}
for record in base:
key, values = record[1], record[2:]
for unique in set(values):
uniques_dict.setdefault(unique, set()).add(key)
现在 list(uniques_dict)
是您的独特列表,在您处理 base
:
时构建
>>> uniques_dict = {}
>>> for record in base:
... key, values = record[1], record[2:]
... for unique in set(values):
... uniques_dict.setdefault(unique, set()).append(key)
...
>>> uniques_dict
{1: {5001}, 2: {5002, 5003}, 3: {5004}, 4: {5001}, 5: {5002}, 6: {5003, 5004}, 7: {5003}, 8: {5001}, 9: {5004}}
>>> list(uniques_dict)
[1, 2, 3, 4, 5, 6, 7, 8, 9]
对象:生成一个字典,显示 base[2:]
中的哪些唯一值(在列表 uniques
中捕获)与之关联
base[1]
值(即 5001、5002 等)。
下面的代码有效,但对于我的数据量来说太慢了
需要处理,所以我正在寻找一种更快的方法来实现这一点。
base = [['a', 5001, 1, 4, 8],
['b', 5002, 2, 5],
['c', 5002, 2, 5],
['d', 5003, 2, 6, 7],
['e', 5004, 3, 6, 9]]
uniques = [1,2,3,4,5,6,7,8,9]
uniques_dict = {}
for item in uniques:
uniques_dict[item] = list(set([records[1] for records in base if item in records[2:]]))
print(uniques_dict)
Output:
{ 1: [5001], 2: [5002, 5003], 3: [5004],
4: [5001], 5: [5002], 6: [5003, 5004],
7: [5003], 8: [5001], 9: [5004] }
与其一遍又一遍地遍历所有 records
,不如反转循环。将 uniques
设置为 集 以进行快速成员资格测试,并循环 records
一次。
更好的是,该集合可以由字典键处理:
uniques_dict = {u: [] for u in uniques}
for record in base:
key, values = record[1], record[2:]
for unique in uniques_dict.keys() & values: # the intersection
uniques_dict[unique].append(key)
在 Python 3 中,dict.keys()
是完全相同行为的 dictionary view object which acts like a set. You can create an intersection with that set with the &
and operator. If you are using Python 2, replace uniques_dict.keys()
with uniques_dict.viewkeys()
。
集路口快速高效;您仍然需要将 record[2:]
中的每个元素与键集进行匹配,但它是 O(N) 循环而不是 O(NK) 循环,因为每个键测试都是独立于 K = 的 O(1) 操作len(unique_keys)
.
演示:
>>> base = [['a', 5001, 1, 4, 8],
... ['b', 5002, 2, 5],
... ['c', 5002, 2, 5],
... ['d', 5003, 2, 6, 7],
... ['e', 5004, 3, 6, 9]]
>>> uniques = [1,2,3,4,5,6,7,8,9]
>>> uniques_dict = {u: [] for u in uniques}
>>> for record in base:
... key, values = record[1], record[2:]
... for unique in uniques_dict.keys() & values: # the intersection
... uniques_dict[unique].append(key)
...
>>> uniques_dict
{1: [5001], 2: [5002, 5002, 5003], 3: [5004], 4: [5001], 5: [5002, 5002], 6: [5003, 5004], 7: [5003], 8: [5001], 9: [5004]}
如果 uniques
是 base[*][2:]
中所有可能值的严格超集,那么您甚至不必预先计算这些值。只需在进行时创建字典键,并在每个 record[2:]
列表上使用 set()
以仅处理唯一值。还应设置 uniques_dict
值以消除添加的重复键:
uniques_dict = {}
for record in base:
key, values = record[1], record[2:]
for unique in set(values):
uniques_dict.setdefault(unique, set()).add(key)
现在 list(uniques_dict)
是您的独特列表,在您处理 base
:
>>> uniques_dict = {}
>>> for record in base:
... key, values = record[1], record[2:]
... for unique in set(values):
... uniques_dict.setdefault(unique, set()).append(key)
...
>>> uniques_dict
{1: {5001}, 2: {5002, 5003}, 3: {5004}, 4: {5001}, 5: {5002}, 6: {5003, 5004}, 7: {5003}, 8: {5001}, 9: {5004}}
>>> list(uniques_dict)
[1, 2, 3, 4, 5, 6, 7, 8, 9]