KeyError: 'this' , why this error happened?
KeyError: 'this' , why this error happened?
CORPUS = [
'this is the first document',
'this is the second document',
'and this is the third document',
'is this the first document ?'
]
doc = CORPUS
dic = {}
for sentence in doc:
k = list(sentence.split())
for term in k:
count_term = k.count(term)
if not dic[term]:
dic[term] = count_term
else:
dic[term] += count_term
print(dic)
我想计算语料库列表中句子中术语的频率,所以我尝试制作字典并输入计数对象但KeyError:'this'
你能解释一下为什么会发生错误吗?
你应该改变条件:
if not dic[term]
到 if term not in dic
如果键不在字典中,它将抛出 KeyError
,而不会 return None
。您可以检查密钥是否在字典中
for term in k:
count_term = k.count(term)
if term not in dic:
dic[term] = count_term
else:
dic[term] += count_term
或使用具有默认值的 get()
,这将 return 0 以防键不在字典中
for term in k:
count_term = k.count(term)
dic[term] = dic.get(term, 0) + count_term
如果我理解正确,你的代码可以简化为:
from collections import Counter
print(Counter(" ".join(CORPUS).split()))
产生
Counter({'this': 4,
'is': 4,
'the': 4,
'first': 2,
'document': 4,
'second': 1,
'and': 1,
'third': 1,
'?': 1})
所以,我们的想法是首先创建一个避免循环的长字符串,然后使用内置函数来计算单个单词的出现次数。
您得到错误的原因在其他两个答案中得到了很好的解释(我对它们都投了赞成票):)
CORPUS = [
'this is the first document',
'this is the second document',
'and this is the third document',
'is this the first document ?'
]
doc = CORPUS
dic = {}
for sentence in doc:
k = list(sentence.split())
for term in k:
count_term = k.count(term)
if not dic[term]:
dic[term] = count_term
else:
dic[term] += count_term
print(dic)
我想计算语料库列表中句子中术语的频率,所以我尝试制作字典并输入计数对象但KeyError:'this'
你能解释一下为什么会发生错误吗?
你应该改变条件:
if not dic[term]
到 if term not in dic
如果键不在字典中,它将抛出 KeyError
,而不会 return None
。您可以检查密钥是否在字典中
for term in k:
count_term = k.count(term)
if term not in dic:
dic[term] = count_term
else:
dic[term] += count_term
或使用具有默认值的 get()
,这将 return 0 以防键不在字典中
for term in k:
count_term = k.count(term)
dic[term] = dic.get(term, 0) + count_term
如果我理解正确,你的代码可以简化为:
from collections import Counter
print(Counter(" ".join(CORPUS).split()))
产生
Counter({'this': 4,
'is': 4,
'the': 4,
'first': 2,
'document': 4,
'second': 1,
'and': 1,
'third': 1,
'?': 1})
所以,我们的想法是首先创建一个避免循环的长字符串,然后使用内置函数来计算单个单词的出现次数。
您得到错误的原因在其他两个答案中得到了很好的解释(我对它们都投了赞成票):)