如何从 defaultdict(list) 中提取顶部项目?
How to extract top item from a defaultdict(list)?
我刚开始使用 defaultdict
s。
我有一个匹配脚本,它将唯一标识符作为“键”放置,然后使用 defaultdict(list)
将标识符的潜在匹配列表放入字典中。匹配项是公司名称、地址和匹配分数(基于匹配算法)。有时它是 1-1 匹配,这意味着有 1 个键与匹配相关联,但有时算法会捕获接近的匹配,因此有时会有多个匹配。对于那些我喜欢 select 这场得分最高的比赛。
目标: 从 defaultdict(list) 中为每个唯一标识符提取数据。如果 unique identifier 的值超过 1 个,则取 Lev Score、Fuzzy Score 和 Jaro score 最高的数据。
数据预览如下:
#imports
from collections import defaultdict
test_dic_stack = defaultdict(list)
#testing data (unique1 has a 1-1 match & unique2 has a 1-5 match)
test_dic_stack['unique1'].append({'Account Name': 'company1', 'Matching Account': 'company1', 'Account_Address': '123 Road', 'Address_match': '123 Road', 'Lev_score': 98.0, 'Fuzzy_score': 100, 'Jaro_Score': 99.0})
test_dic_stack['unique2'].append({'Account Name': 'company1', 'Matching Account': 'company1', 'Account_Address': '1 awesome street', 'Address_match': '1 awesome street', 'Lev_score': 91.0, 'Fuzzy_score': 89, 'Jaro_Score': 99.0})
test_dic_stack['unique2'].append({'Account Name': 'company2', 'Matching Account': 'company2', 'Account_Address': '1 awesome street', 'Address_match': '1 awesome st', 'Lev_score': 71.0, 'Fuzzy_score': 82, 'Jaro_Score': 84.0})
test_dic_stack['unique2'].append({'Account Name': 'company3', 'Matching Account': 'company3', 'Account_Address': '1 awesome street', 'Address_match': '1 awesome street suite 1', 'Lev_score': 88.0, 'Fuzzy_score': 89, 'Jaro_Score': 90.0})
test_dic_stack['unique2'].append({'Account Name': 'company4', 'Matching Account': 'company4', 'Account_Address': '1 awesome street', 'Address_match': '1 awe street', 'Lev_score': 81.0, 'Fuzzy_score': 90, 'Jaro_Score': 86.0})
test_dic_stack['unique2'].append({'Account Name': 'company5', 'Matching Account': 'company5', 'Account_Address': '1 awesome street', 'Address_match': '1 awe st', 'Lev_score': 70.0, 'Fuzzy_score': 86, 'Jaro_Score': 89.0})
#defaultdict preview
defaultdict(list,
{'unique1': [{'Account Name': 'company1',
'Matching Account': 'company1',
'Account_Address': '123 Road',
'Address_match': '123 Road',
'Lev_score': 98.0,
'Fuzzy_score': 100,
'Jaro_Score': 99.0}],
'unique2': [{'Account Name': 'company1',
'Matching Account': 'company1',
'Account_Address': '1 awesome street',
'Address_match': '1 awesome street',
'Lev_score': 91.0,
'Fuzzy_score': 89,
'Jaro_Score': 99.0},
{'Account Name': 'company2',
'Matching Account': 'company2',
'Account_Address': '1 awesome street',
'Address_match': '1 awesome st',
'Lev_score': 71.0,
'Fuzzy_score': 82,
'Jaro_Score': 84.0},
{'Account Name': 'company3',
'Matching Account': 'company3',
'Account_Address': '1 awesome street',
'Address_match': '1 awesome street suite 1',
'Lev_score': 88.0,
'Fuzzy_score': 89,
'Jaro_Score': 90.0},
{'Account Name': 'company4',
'Matching Account': 'company4',
'Account_Address': '1 awesome street',
'Address_match': '1 awe street',
'Lev_score': 81.0,
'Fuzzy_score': 90,
'Jaro_Score': 86.0},
{'Account Name': 'company5',
'Matching Account': 'company5',
'Account_Address': '1 awesome street',
'Address_match': '1 awe st',
'Lev_score': 70.0,
'Fuzzy_score': 86,
'Jaro_Score': 89.0}]})
这是我请求的结果:
提取 unique1 数据并提取 unique2 “最佳匹配”数据。 注意有时最佳匹配并不总是第一个
results = [{'unique1': {'Account Name': 'company1',
'Matching Account': 'company1',
'Account_Address': '123 Road',
'Address_match': '123 Road',
'Lev_score': 98.0,
'Fuzzy_score': 100,
'Jaro_Score': 99.0},
'unique2': {'Account Name': 'company1',
'Matching Account': 'company1',
'Account_Address': '1 awesome street',
'Address_match': '1 awesome street',
'Lev_score': 91.0,
'Fuzzy_score': 89,
'Jaro_Score': 99.0}]
您可以使用 max
的字典理解,使用三个分数的总和作为键。
假设d
输入字典。
out = {k:max(v, key=lambda x: sum((x['Fuzzy_score'], x['Lev_score'], x['Jaro_Score'])))
for k,v in d.items()}
输出:
{'unique1': {'Account Name': 'company1',
'Matching Account': 'company1',
'Account_Address': '123 Road',
'Address_match': '123 Road',
'Lev_score': 98.0,
'Fuzzy_score': 100,
'Jaro_Score': 99.0},
'unique2': {'Account Name': 'company1',
'Matching Account': 'company1',
'Account_Address': '1 awesome street',
'Address_match': '1 awesome street',
'Lev_score': 91.0,
'Fuzzy_score': 89,
'Jaro_Score': 99.0}}
我刚开始使用 defaultdict
s。
我有一个匹配脚本,它将唯一标识符作为“键”放置,然后使用 defaultdict(list)
将标识符的潜在匹配列表放入字典中。匹配项是公司名称、地址和匹配分数(基于匹配算法)。有时它是 1-1 匹配,这意味着有 1 个键与匹配相关联,但有时算法会捕获接近的匹配,因此有时会有多个匹配。对于那些我喜欢 select 这场得分最高的比赛。
目标: 从 defaultdict(list) 中为每个唯一标识符提取数据。如果 unique identifier 的值超过 1 个,则取 Lev Score、Fuzzy Score 和 Jaro score 最高的数据。
数据预览如下:
#imports
from collections import defaultdict
test_dic_stack = defaultdict(list)
#testing data (unique1 has a 1-1 match & unique2 has a 1-5 match)
test_dic_stack['unique1'].append({'Account Name': 'company1', 'Matching Account': 'company1', 'Account_Address': '123 Road', 'Address_match': '123 Road', 'Lev_score': 98.0, 'Fuzzy_score': 100, 'Jaro_Score': 99.0})
test_dic_stack['unique2'].append({'Account Name': 'company1', 'Matching Account': 'company1', 'Account_Address': '1 awesome street', 'Address_match': '1 awesome street', 'Lev_score': 91.0, 'Fuzzy_score': 89, 'Jaro_Score': 99.0})
test_dic_stack['unique2'].append({'Account Name': 'company2', 'Matching Account': 'company2', 'Account_Address': '1 awesome street', 'Address_match': '1 awesome st', 'Lev_score': 71.0, 'Fuzzy_score': 82, 'Jaro_Score': 84.0})
test_dic_stack['unique2'].append({'Account Name': 'company3', 'Matching Account': 'company3', 'Account_Address': '1 awesome street', 'Address_match': '1 awesome street suite 1', 'Lev_score': 88.0, 'Fuzzy_score': 89, 'Jaro_Score': 90.0})
test_dic_stack['unique2'].append({'Account Name': 'company4', 'Matching Account': 'company4', 'Account_Address': '1 awesome street', 'Address_match': '1 awe street', 'Lev_score': 81.0, 'Fuzzy_score': 90, 'Jaro_Score': 86.0})
test_dic_stack['unique2'].append({'Account Name': 'company5', 'Matching Account': 'company5', 'Account_Address': '1 awesome street', 'Address_match': '1 awe st', 'Lev_score': 70.0, 'Fuzzy_score': 86, 'Jaro_Score': 89.0})
#defaultdict preview
defaultdict(list,
{'unique1': [{'Account Name': 'company1',
'Matching Account': 'company1',
'Account_Address': '123 Road',
'Address_match': '123 Road',
'Lev_score': 98.0,
'Fuzzy_score': 100,
'Jaro_Score': 99.0}],
'unique2': [{'Account Name': 'company1',
'Matching Account': 'company1',
'Account_Address': '1 awesome street',
'Address_match': '1 awesome street',
'Lev_score': 91.0,
'Fuzzy_score': 89,
'Jaro_Score': 99.0},
{'Account Name': 'company2',
'Matching Account': 'company2',
'Account_Address': '1 awesome street',
'Address_match': '1 awesome st',
'Lev_score': 71.0,
'Fuzzy_score': 82,
'Jaro_Score': 84.0},
{'Account Name': 'company3',
'Matching Account': 'company3',
'Account_Address': '1 awesome street',
'Address_match': '1 awesome street suite 1',
'Lev_score': 88.0,
'Fuzzy_score': 89,
'Jaro_Score': 90.0},
{'Account Name': 'company4',
'Matching Account': 'company4',
'Account_Address': '1 awesome street',
'Address_match': '1 awe street',
'Lev_score': 81.0,
'Fuzzy_score': 90,
'Jaro_Score': 86.0},
{'Account Name': 'company5',
'Matching Account': 'company5',
'Account_Address': '1 awesome street',
'Address_match': '1 awe st',
'Lev_score': 70.0,
'Fuzzy_score': 86,
'Jaro_Score': 89.0}]})
这是我请求的结果:
提取 unique1 数据并提取 unique2 “最佳匹配”数据。 注意有时最佳匹配并不总是第一个
results = [{'unique1': {'Account Name': 'company1',
'Matching Account': 'company1',
'Account_Address': '123 Road',
'Address_match': '123 Road',
'Lev_score': 98.0,
'Fuzzy_score': 100,
'Jaro_Score': 99.0},
'unique2': {'Account Name': 'company1',
'Matching Account': 'company1',
'Account_Address': '1 awesome street',
'Address_match': '1 awesome street',
'Lev_score': 91.0,
'Fuzzy_score': 89,
'Jaro_Score': 99.0}]
您可以使用 max
的字典理解,使用三个分数的总和作为键。
假设d
输入字典。
out = {k:max(v, key=lambda x: sum((x['Fuzzy_score'], x['Lev_score'], x['Jaro_Score'])))
for k,v in d.items()}
输出:
{'unique1': {'Account Name': 'company1',
'Matching Account': 'company1',
'Account_Address': '123 Road',
'Address_match': '123 Road',
'Lev_score': 98.0,
'Fuzzy_score': 100,
'Jaro_Score': 99.0},
'unique2': {'Account Name': 'company1',
'Matching Account': 'company1',
'Account_Address': '1 awesome street',
'Address_match': '1 awesome street',
'Lev_score': 91.0,
'Fuzzy_score': 89,
'Jaro_Score': 99.0}}