KeyError: 'this' , why this error happened?

KeyError: 'this' , why this error happened?

CORPUS = [
        'this is the first document',
        'this is the second document',
        'and this is the third document',
        'is this the first document ?'
    ]

doc = CORPUS
dic = {}
for sentence in doc:
  k = list(sentence.split())
  for term in k:
    count_term = k.count(term)
    if not dic[term]:
      dic[term] = count_term
    else:
      dic[term] += count_term
print(dic)

我想计算语料库列表中句子中术语的频率,所以我尝试制作字典并输入计数对象但KeyError:'this'

你能解释一下为什么会发生错误吗?

你应该改变条件:

if not dic[term]if term not in dic

如果键不在字典中,它将抛出 KeyError,而不会 return None。您可以检查密钥是否在字典中

for term in k:
    count_term = k.count(term)
    if term not in dic:
        dic[term] = count_term
    else:
        dic[term] += count_term

或使用具有默认值的 get(),这将 return 0 以防键不在字典中

for term in k:
    count_term = k.count(term)
    dic[term] = dic.get(term, 0) + count_term

如果我理解正确,你的代码可以简化为:

from collections import Counter

print(Counter(" ".join(CORPUS).split()))

产生

Counter({'this': 4,
         'is': 4,
         'the': 4,
         'first': 2,
         'document': 4,
         'second': 1,
         'and': 1,
         'third': 1,
         '?': 1})

所以,我们的想法是首先创建一个避免循环的长字符串,然后使用内置函数来计算单个单词的出现次数。

您得到错误的原因在其他两个答案中得到了很好的解释(我对它们都投了赞成票):)