您如何有效地搜索 Python 中的一组列表?
How do you efficiently search across a group of lists in Python?
在 Python 中,我有一组跟踪一些用户信息的列表:
user_id = [1,2,3,4,5]
user_name = ['bob', 'alice', 'jerry', 'lisa', 'tom']
user_email = ['bob@email.com', 'alice@email.com', 'jerry@email.com', 'lisa@email.com', 'tom@email.com']
...
每个列表中的第 i 个元素相互对应。
我想在给定信息“y”的情况下获取用户信息“x”。在大多数情况下,我会为此使用字典来持续查找时间,但我不想构建和维护数十个字典。
如果我为上面显示的每对列表维护一个字典,我会
name:email
email:name
name:id
id:name
email:id
id:email
它已经开始变得难以管理 - 并且随着属性数量的增加而迅速增长。
我可以将所有内容都映射到 user_id,然后只有 2n 个字典,但很高兴了解更适合此用例的数据结构。
为了说明代码目前是如何实现的:
def get_email_by_user_id(user_id):
return [email for email, uid in zip(user_email, user_id) if uid == user_id][0]
如你所想,非常慢:P
# Dict for holding your data
data = dict()
# Put all your stuff into data
for id, name, email in zip( user_id, user_name , user_email):
data[ id ] = { "id": id , "username" : name , "email" : email }
# Function for lookup up by key and value
def lookup_info( key_name , lookup_value , data ):
'''
Takes a key name, a lookup value and a dictionary of data.
Returns the dictionary item
'''
for k,v in data.items():
if v[ key_name ] == lookup_value:
return( data[ k ] )
由于数据是相关的,因此可以将它们组织成一个包含相关列的元组列表。
DATA = [
(1, 'bob', 'bob@email.com'),
(2, 'alice', 'alice@email.com'),
(3, 'jerry', 'jerry@email.com'),
(4, 'lisa', 'lisa@email.com'),
(5, 'tom', 'tom@email.com'),
]
然后,可以制作一个仅考虑您感兴趣的列的通用函数。
def find_user(user_id=None, user_name=None, user_email=None):
"""Find first user matching given criteria.
A None value means "don't care".
Returns tuple of (id, name, email) if found, otherwise None.
"""
# Collect desired criteria into mapping of record index to desired index value.
criteria_cols = {i: c for (i, c) in enumerate((user_id, user_name, user_email)) if c is not None}
for rec in DATA:
if all(rec[idx] == criteria for (idx, criteria) in criteria_cols.items()):
return rec # return early if found.
此函数考虑任何非 None 值,并且 return 是匹配记录。如果没有记录匹配,则失败并 return 默认 None 值。
print(find_user(user_id=1))
print(find_user(user_id=2))
print(find_user(user_name="alice"))
print(find_user(user_email="jerry@email.com"))
print(find_user(user_id=3, user_email="jerry@email.com"))
print(find_user(user_id=2, user_email="jerry@email.com"))
print(find_user(user_id=3, user_name="jerry"))
结果
(1, 'bob', 'bob@email.com')
(2, 'alice', 'alice@email.com')
(2, 'alice', 'alice@email.com')
(3, 'jerry', 'jerry@email.com')
(3, 'jerry', 'jerry@email.com')
None
(3, 'jerry', 'jerry@email.com')
最后,我选择了唯一能够提供所需性能的选项
我决定 user_id
的内容是规范标识符。
然后我创建了以下词典:
def make_dictionaries(user_id, other_lists=[('user_name', user_name), ('user_email', user_email)]):
to_id_dictionary = {}
from_id_dictionary = {}
for list_name, list_content in other_lists:
from_id_dictionary[list_name] = {uid:cont for uid,cont in zip(user_id, list_content)}
to_id_dictionary[list_name] = {cont:uid for uid,cont in zip(user_id, list_content)}
return to_id_dictionary, from_id_dictionary
然后我可以做:
def get_email_by_user_name(user_name):
uid = to_id_dictionary['user_name'][user_name] # Get UID from name
return from_id_dictionary[user_email][uid] # Get email from UID
在 Python 中,我有一组跟踪一些用户信息的列表:
user_id = [1,2,3,4,5]
user_name = ['bob', 'alice', 'jerry', 'lisa', 'tom']
user_email = ['bob@email.com', 'alice@email.com', 'jerry@email.com', 'lisa@email.com', 'tom@email.com']
...
每个列表中的第 i 个元素相互对应。
我想在给定信息“y”的情况下获取用户信息“x”。在大多数情况下,我会为此使用字典来持续查找时间,但我不想构建和维护数十个字典。
如果我为上面显示的每对列表维护一个字典,我会
name:email
email:name
name:id
id:name
email:id
id:email
它已经开始变得难以管理 - 并且随着属性数量的增加而迅速增长。
我可以将所有内容都映射到 user_id,然后只有 2n 个字典,但很高兴了解更适合此用例的数据结构。
为了说明代码目前是如何实现的:
def get_email_by_user_id(user_id):
return [email for email, uid in zip(user_email, user_id) if uid == user_id][0]
如你所想,非常慢:P
# Dict for holding your data
data = dict()
# Put all your stuff into data
for id, name, email in zip( user_id, user_name , user_email):
data[ id ] = { "id": id , "username" : name , "email" : email }
# Function for lookup up by key and value
def lookup_info( key_name , lookup_value , data ):
'''
Takes a key name, a lookup value and a dictionary of data.
Returns the dictionary item
'''
for k,v in data.items():
if v[ key_name ] == lookup_value:
return( data[ k ] )
由于数据是相关的,因此可以将它们组织成一个包含相关列的元组列表。
DATA = [
(1, 'bob', 'bob@email.com'),
(2, 'alice', 'alice@email.com'),
(3, 'jerry', 'jerry@email.com'),
(4, 'lisa', 'lisa@email.com'),
(5, 'tom', 'tom@email.com'),
]
然后,可以制作一个仅考虑您感兴趣的列的通用函数。
def find_user(user_id=None, user_name=None, user_email=None):
"""Find first user matching given criteria.
A None value means "don't care".
Returns tuple of (id, name, email) if found, otherwise None.
"""
# Collect desired criteria into mapping of record index to desired index value.
criteria_cols = {i: c for (i, c) in enumerate((user_id, user_name, user_email)) if c is not None}
for rec in DATA:
if all(rec[idx] == criteria for (idx, criteria) in criteria_cols.items()):
return rec # return early if found.
此函数考虑任何非 None 值,并且 return 是匹配记录。如果没有记录匹配,则失败并 return 默认 None 值。
print(find_user(user_id=1))
print(find_user(user_id=2))
print(find_user(user_name="alice"))
print(find_user(user_email="jerry@email.com"))
print(find_user(user_id=3, user_email="jerry@email.com"))
print(find_user(user_id=2, user_email="jerry@email.com"))
print(find_user(user_id=3, user_name="jerry"))
结果
(1, 'bob', 'bob@email.com')
(2, 'alice', 'alice@email.com')
(2, 'alice', 'alice@email.com')
(3, 'jerry', 'jerry@email.com')
(3, 'jerry', 'jerry@email.com')
None
(3, 'jerry', 'jerry@email.com')
最后,我选择了唯一能够提供所需性能的选项
我决定 user_id
的内容是规范标识符。
然后我创建了以下词典:
def make_dictionaries(user_id, other_lists=[('user_name', user_name), ('user_email', user_email)]):
to_id_dictionary = {}
from_id_dictionary = {}
for list_name, list_content in other_lists:
from_id_dictionary[list_name] = {uid:cont for uid,cont in zip(user_id, list_content)}
to_id_dictionary[list_name] = {cont:uid for uid,cont in zip(user_id, list_content)}
return to_id_dictionary, from_id_dictionary
然后我可以做:
def get_email_by_user_name(user_name):
uid = to_id_dictionary['user_name'][user_name] # Get UID from name
return from_id_dictionary[user_email][uid] # Get email from UID