Try, Except / If 语句组合 - 缺少结果
Try, Except / If Statement Combination - Missing results
我正在将一个大学列表与其他 12 个列表进行比较,查找模糊字符串匹配并将 所有 结果写入 csv。我没有对一个大列表进行模糊字符串匹配,因为我需要知道匹配来自哪个列表。
列表示例:
data = [[1-00000, "MIT"], [1-00001, "Stanford"] ,...]
Data1 = ['MASSACHUSETTS INSTITUTE OF TECHNOLOGY (MIT)'], ['STANFORD UNIVERSITY'],...
在 Whosebug 的帮助下,我得到了:
for uni in data:
hit = process.extractOne(str(uni[1]), data10, scorer = fuzz.token_set_ratio, score_cutoff = 90)
try:
if float(hit[1]) >= 94:
with open(filename, mode='a', newline="") as csv_file:
fieldnames = ['bwbnr', 'uni_name', 'match', 'points']
writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': str(hit), 'points': 10})
except:
hit1 = process.extractOne(str(uni[1]), data11, scorer = fuzz.token_set_ratio, score_cutoff = 90)
try:
if float(hit1[1]) >= 94:
with open(filename, mode='a', newline="") as csv_file:
fieldnames = ['bwbnr', 'uni_name', 'match', 'points']
writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': str(hit), 'points': 5})
在 12 个列表中往下看,直到最后一个列表除外,其中我包括那些分数低于 94 并以 "not found":
结尾的列表
except:
hit12 = process.extractOne(str(uni[1]), data9, scorer = fuzz.token_set_ratio)
try:
if float(hit12[1]) < 94:
with open(filename, mode='a', newline="") as csv_file:
fieldnames = ['bwbnr', 'uni_name', 'match', 'points']
writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': str(hit), 'points': 3})
except:
with open(filename, mode='a', newline="") as csv_file:
fieldnames = ['bwbnr', 'uni_name', 'match', 'points']
writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': str(hit), 'points': 3})
但是,我只返回了 2854 个结果,而不是原始列表中的 3175 个结果(所有这些都需要检查并写入新的 csv)。
当我将所有列表放在一起并执行 extractOne 时,我确实得到了 3175 个结果:
scored_testdata = []
for uni in data:
hit = process.extractOne(str(uni[1]), big_list, scorer = fuzzy.token_set_ratio, score_cutoff = 90)
scored_testdata.append(hit)
print(len(scored_testdata))
我在这里错过了什么?我感觉 process.extractOne
中返回 "None" 的结果由于某种原因被丢弃了。
任何帮助将不胜感激。
可以找到完整代码here。
最后的 try-except 应该是检查所有列表并在没有 score_cutoff 的情况下执行 extractBest:
except:
hit12 = process.extractOne(str(uni[1]), big_list, scorer = fuzz.token_set_ratio)
with open(filename, mode='a', newline="") as csv_file:
fieldnames = ['bwbnr', 'uni_name', 'match', 'confidence', 'points']
writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': "CHECK AGAIN " + str(hit12[0]), 'confidence': str(hit12[1]), 'points': 3})
我正在将一个大学列表与其他 12 个列表进行比较,查找模糊字符串匹配并将 所有 结果写入 csv。我没有对一个大列表进行模糊字符串匹配,因为我需要知道匹配来自哪个列表。 列表示例:
data = [[1-00000, "MIT"], [1-00001, "Stanford"] ,...]
Data1 = ['MASSACHUSETTS INSTITUTE OF TECHNOLOGY (MIT)'], ['STANFORD UNIVERSITY'],...
在 Whosebug 的帮助下,我得到了:
for uni in data:
hit = process.extractOne(str(uni[1]), data10, scorer = fuzz.token_set_ratio, score_cutoff = 90)
try:
if float(hit[1]) >= 94:
with open(filename, mode='a', newline="") as csv_file:
fieldnames = ['bwbnr', 'uni_name', 'match', 'points']
writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': str(hit), 'points': 10})
except:
hit1 = process.extractOne(str(uni[1]), data11, scorer = fuzz.token_set_ratio, score_cutoff = 90)
try:
if float(hit1[1]) >= 94:
with open(filename, mode='a', newline="") as csv_file:
fieldnames = ['bwbnr', 'uni_name', 'match', 'points']
writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': str(hit), 'points': 5})
在 12 个列表中往下看,直到最后一个列表除外,其中我包括那些分数低于 94 并以 "not found":
结尾的列表 except:
hit12 = process.extractOne(str(uni[1]), data9, scorer = fuzz.token_set_ratio)
try:
if float(hit12[1]) < 94:
with open(filename, mode='a', newline="") as csv_file:
fieldnames = ['bwbnr', 'uni_name', 'match', 'points']
writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': str(hit), 'points': 3})
except:
with open(filename, mode='a', newline="") as csv_file:
fieldnames = ['bwbnr', 'uni_name', 'match', 'points']
writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': str(hit), 'points': 3})
但是,我只返回了 2854 个结果,而不是原始列表中的 3175 个结果(所有这些都需要检查并写入新的 csv)。
当我将所有列表放在一起并执行 extractOne 时,我确实得到了 3175 个结果:
scored_testdata = []
for uni in data:
hit = process.extractOne(str(uni[1]), big_list, scorer = fuzzy.token_set_ratio, score_cutoff = 90)
scored_testdata.append(hit)
print(len(scored_testdata))
我在这里错过了什么?我感觉 process.extractOne
中返回 "None" 的结果由于某种原因被丢弃了。
任何帮助将不胜感激。
可以找到完整代码here。
最后的 try-except 应该是检查所有列表并在没有 score_cutoff 的情况下执行 extractBest:
except:
hit12 = process.extractOne(str(uni[1]), big_list, scorer = fuzz.token_set_ratio)
with open(filename, mode='a', newline="") as csv_file:
fieldnames = ['bwbnr', 'uni_name', 'match', 'confidence', 'points']
writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': "CHECK AGAIN " + str(hit12[0]), 'confidence': str(hit12[1]), 'points': 3})