如何有效地处理带有动态键的 Python 字典?
How to effectively deal with a Python dictionary with dynamic keys?
如何有效处理带有动态键的 Python 字典?
我使用来自荷兰的开放数据。每个地区/年份都有一本字典。字典键每年都不同。我怎样才能编写有效的代码来处理这个问题?
我有两个工作结构,如下面的示例所示:但是每个键都需要付出努力,而且开放数据中有 108 个键,所以我真的希望 Python 提供一个更好的解决方案,我不是还没有意识到!
关于开放数据的信息:
每年是一个包含 16194 部词典的列表。荷兰每个社区一本字典。每个字典有 108 个项目(键,值对):
>>> import cbsodata
>>> table = '83487NED'
>>> data = cbsodata.get_data(table, dir=None, typed=False)
Retrieving data from table '83487NED'
Done!
>>> len(data)
16194
>>> data[0]
{'Gehuwd_14': 1565, 'MateVanStedelijkheid_105': 5, 'Bevolkingsdichtheid_33': 1350, 'Gemeentenaam_1': 'Aa en Hunze ', ... etc
>>> len(data[0])
108
一个key可能一年是'Code_3',下一年是'Code_4'...
用于示例解决方案的示例数据:
data2016 = [{'Code_3': 'BU01931000', 'ZipCode_106': '2251MT', 'City_12': 'Amsterdam', 'Number_of_people_5': '24000'},
{'Code_3': 'BU02221000', 'ZipCode_106': '2851MT', 'City_12': 'London', 'Number_of_people_5': '88000'},
{'Code_3': 'BU04444000', 'ZipCode_106': '2351MT', 'City_12': 'Paris', 'Number_of_people_5': '133000'}]
data2015 = [{'Code_4': 'BU01931000', 'ZipCode_106': '2251MT', 'City_12': 'Amsterdam', 'Number_of_people_6': '22000'},
{'Code_4': 'BU02221000', 'ZipCode_106': '2851MT', 'City_12': 'London', 'Number_of_people_6': '86000'},
{'Code_4': 'BU04444000', 'ZipCode_106': '2351MT', 'City_12': 'Paris', 'Number_of_people_6': '131000'}]
data2014 = [{'Code_8': 'BU01931000', 'ZipCode_109': '2251MT', 'City_12': 'Amsterdam', 'Number_of_people_14': '18000'},
{'Code_8': 'BU02221000', 'ZipCode_109': '2851MT', 'City_12': 'London', 'Number_of_people_14': '76000'},
{'Code_8': 'BU04444000', 'ZipCode_109': '2351MT', 'City_12': 'Paris', 'Number_of_people_14': '129000'}]
data2013 = [{'Code_8': 'BU01931000', 'ZipCode_109': '2251MT', 'City_12': 'Amsterdam', 'Number_of_people_14': '14000'},
{'Code_8': 'BU02221000', 'ZipCode_109': '2851MT', 'City_12': 'London', 'Number_of_people_14': '74000'}] # data for Paris 'BU04444000' missing in 2013
tables = {2013: data2013, 2014: data2014, 2015: data2015, 2016: data2016}
years = [2013, 2014, 2015, 2016]
current_year = 2016
示例解决方案 1,键的映射:
def CBSkey(key, year):
if key == 'key_code':
if year == 2013:
return('Code_8')
elif year == 2014:
return('Code_8')
elif year == 2015:
return('Code_4')
elif year == 2016:
return('Code_3')
elif key == 'key_people':
if year == 2013:
return('Number_of_people_14')
elif year == 2014:
return('Number_of_people_14')
elif year == 2015:
return('Number_of_people_6')
elif year == 2016:
return('Number_of_people_5')
for record_now in tables[current_year]:
code = record_now['Code_3']
city = record_now['City_12']
people = {}
for year in years:
code_year = CBSkey('key_code', year)
people_year = CBSkey('key_people', year)
for record in tables[year]:
if record[code_year] == code:
people[year] = (record[people_year])
print(people)
所有 3 个示例解决方案的输出:
{2016: '24000', 2013: '14000', 2014: '18000', 2015: '22000'}
{2016: '88000', 2013: '74000', 2014: '76000', 2015: '86000'}
{2016: '133000', 2014: '129000', 2015: '131000'}
示例 2,select 基于项目的正确字典然后遍历所有其他键以查找附加数据:
for record_now in tables[current_year]:
city = record_now['City_12']
code = record_now['Code_3']
print('Code: ', code)
people = {}
for year in years:
for record in tables[year]:
for v in record.values():
if v == code:
for k in record.keys():
key_type = CBSkey(k)
if key_type == 'People_type':
people[year] = (record[k])
print(people)
希望有一些聪明的 'Pythonic' 想法,非常感谢!
如果我对这个数据集理解正确的话,每一年的数据都是一个列表
许多命令;给定年份的所有字典都使用相同的键;这
密钥每年都不同,但可用的一般数据是相同的。
所以你需要一种方法来有效地从多个数据库中检索相同的数据
年。
首先,我会将所有年份放入一个大字典中,而不是使用
您拥有的间接映射方案:
data = {}
data[2016] = [{'Code_3': 'BU01931000'}] # etc.
data[2015] = [{'Code_4': 'BU01931000'}] # etc.
所以tables
和所有个体datayyyy
离开,tables[year]
变成 data[year]
,years
变成 data.keys()
.
然后,我会计算出从年份到键的映射。
"""ytok structure
ytok maps years to dicts of keys. ytok[2016] would be:
{'code': 'Code_3', 'zip': 'ZipCode_106', 'city': 'City_12',
'people': 'Number_of_people_5'}
"""
这是构造 ytok
的一种方法,显示中间结果
把过程说清楚:
ytok = {}
for year in data.keys():
sample = data[year][0]
outputs = list(sorted(sample.keys()))
# Will be in this order: city, code, people, zip
inputs = 'city code people zip'.split()
pairs = list(zip(inputs, outputs))
print(pairs)
yeardict = dict(pairs)
print(yeardict)
ytok[year] = yeardict
print(ytok)
这里有一个更精简的方式:
inputs = 'city code people zip'.split()
for year in data.keys():
outputs = sorted(data[year][0].keys())
ytok[year] = dict(zip(inputs, outputs))
print(ytok)
然后像这样使用 ytok
:
wanted_code = 'BU02221000'
people = {}
for year in data.keys():
codekey = ytok[year]['code']
peoplekey = ytok[year]['people']
for record in data[year]:
if record[codekey] == wanted_code:
people[year] = record[peoplekey]
break
print(people)
一旦找到正确的记录,请注意使用 break
。有
一旦我们找到了我们想要的东西,再继续搜索一年就没有意义了,
所以我们跳出内部 for record
循环。
如何有效处理带有动态键的 Python 字典?
我使用来自荷兰的开放数据。每个地区/年份都有一本字典。字典键每年都不同。我怎样才能编写有效的代码来处理这个问题?
我有两个工作结构,如下面的示例所示:但是每个键都需要付出努力,而且开放数据中有 108 个键,所以我真的希望 Python 提供一个更好的解决方案,我不是还没有意识到!
关于开放数据的信息: 每年是一个包含 16194 部词典的列表。荷兰每个社区一本字典。每个字典有 108 个项目(键,值对):
>>> import cbsodata
>>> table = '83487NED'
>>> data = cbsodata.get_data(table, dir=None, typed=False)
Retrieving data from table '83487NED'
Done!
>>> len(data)
16194
>>> data[0]
{'Gehuwd_14': 1565, 'MateVanStedelijkheid_105': 5, 'Bevolkingsdichtheid_33': 1350, 'Gemeentenaam_1': 'Aa en Hunze ', ... etc
>>> len(data[0])
108
一个key可能一年是'Code_3',下一年是'Code_4'...
用于示例解决方案的示例数据:
data2016 = [{'Code_3': 'BU01931000', 'ZipCode_106': '2251MT', 'City_12': 'Amsterdam', 'Number_of_people_5': '24000'},
{'Code_3': 'BU02221000', 'ZipCode_106': '2851MT', 'City_12': 'London', 'Number_of_people_5': '88000'},
{'Code_3': 'BU04444000', 'ZipCode_106': '2351MT', 'City_12': 'Paris', 'Number_of_people_5': '133000'}]
data2015 = [{'Code_4': 'BU01931000', 'ZipCode_106': '2251MT', 'City_12': 'Amsterdam', 'Number_of_people_6': '22000'},
{'Code_4': 'BU02221000', 'ZipCode_106': '2851MT', 'City_12': 'London', 'Number_of_people_6': '86000'},
{'Code_4': 'BU04444000', 'ZipCode_106': '2351MT', 'City_12': 'Paris', 'Number_of_people_6': '131000'}]
data2014 = [{'Code_8': 'BU01931000', 'ZipCode_109': '2251MT', 'City_12': 'Amsterdam', 'Number_of_people_14': '18000'},
{'Code_8': 'BU02221000', 'ZipCode_109': '2851MT', 'City_12': 'London', 'Number_of_people_14': '76000'},
{'Code_8': 'BU04444000', 'ZipCode_109': '2351MT', 'City_12': 'Paris', 'Number_of_people_14': '129000'}]
data2013 = [{'Code_8': 'BU01931000', 'ZipCode_109': '2251MT', 'City_12': 'Amsterdam', 'Number_of_people_14': '14000'},
{'Code_8': 'BU02221000', 'ZipCode_109': '2851MT', 'City_12': 'London', 'Number_of_people_14': '74000'}] # data for Paris 'BU04444000' missing in 2013
tables = {2013: data2013, 2014: data2014, 2015: data2015, 2016: data2016}
years = [2013, 2014, 2015, 2016]
current_year = 2016
示例解决方案 1,键的映射:
def CBSkey(key, year):
if key == 'key_code':
if year == 2013:
return('Code_8')
elif year == 2014:
return('Code_8')
elif year == 2015:
return('Code_4')
elif year == 2016:
return('Code_3')
elif key == 'key_people':
if year == 2013:
return('Number_of_people_14')
elif year == 2014:
return('Number_of_people_14')
elif year == 2015:
return('Number_of_people_6')
elif year == 2016:
return('Number_of_people_5')
for record_now in tables[current_year]:
code = record_now['Code_3']
city = record_now['City_12']
people = {}
for year in years:
code_year = CBSkey('key_code', year)
people_year = CBSkey('key_people', year)
for record in tables[year]:
if record[code_year] == code:
people[year] = (record[people_year])
print(people)
所有 3 个示例解决方案的输出:
{2016: '24000', 2013: '14000', 2014: '18000', 2015: '22000'}
{2016: '88000', 2013: '74000', 2014: '76000', 2015: '86000'}
{2016: '133000', 2014: '129000', 2015: '131000'}
示例 2,select 基于项目的正确字典然后遍历所有其他键以查找附加数据:
for record_now in tables[current_year]:
city = record_now['City_12']
code = record_now['Code_3']
print('Code: ', code)
people = {}
for year in years:
for record in tables[year]:
for v in record.values():
if v == code:
for k in record.keys():
key_type = CBSkey(k)
if key_type == 'People_type':
people[year] = (record[k])
print(people)
希望有一些聪明的 'Pythonic' 想法,非常感谢!
如果我对这个数据集理解正确的话,每一年的数据都是一个列表 许多命令;给定年份的所有字典都使用相同的键;这 密钥每年都不同,但可用的一般数据是相同的。 所以你需要一种方法来有效地从多个数据库中检索相同的数据 年。
首先,我会将所有年份放入一个大字典中,而不是使用 您拥有的间接映射方案:
data = {}
data[2016] = [{'Code_3': 'BU01931000'}] # etc.
data[2015] = [{'Code_4': 'BU01931000'}] # etc.
所以tables
和所有个体datayyyy
离开,tables[year]
变成 data[year]
,years
变成 data.keys()
.
然后,我会计算出从年份到键的映射。
"""ytok structure
ytok maps years to dicts of keys. ytok[2016] would be:
{'code': 'Code_3', 'zip': 'ZipCode_106', 'city': 'City_12',
'people': 'Number_of_people_5'}
"""
这是构造 ytok
的一种方法,显示中间结果
把过程说清楚:
ytok = {}
for year in data.keys():
sample = data[year][0]
outputs = list(sorted(sample.keys()))
# Will be in this order: city, code, people, zip
inputs = 'city code people zip'.split()
pairs = list(zip(inputs, outputs))
print(pairs)
yeardict = dict(pairs)
print(yeardict)
ytok[year] = yeardict
print(ytok)
这里有一个更精简的方式:
inputs = 'city code people zip'.split()
for year in data.keys():
outputs = sorted(data[year][0].keys())
ytok[year] = dict(zip(inputs, outputs))
print(ytok)
然后像这样使用 ytok
:
wanted_code = 'BU02221000'
people = {}
for year in data.keys():
codekey = ytok[year]['code']
peoplekey = ytok[year]['people']
for record in data[year]:
if record[codekey] == wanted_code:
people[year] = record[peoplekey]
break
print(people)
一旦找到正确的记录,请注意使用 break
。有
一旦我们找到了我们想要的东西,再继续搜索一年就没有意义了,
所以我们跳出内部 for record
循环。