仅返回大于等于 5 个字符的字数并按键值排序(从高到低)

Only returning word counts for words >= 5 characters & sort by key value (highest to lowest)

我有一个 .txt 文件,我正在寻找 return 每次单词出现在其中的次数。我得到了可以工作的代码,但现在我想细化到只有 returning 长度为 5 个或更多字符的单词。我在 for 语句中添加了“len”函数,但它仍然是 returning 所有单词。任何帮助将不胜感激。

我也想知道是否可以按键数排序,return 键数最高的词排在最前面。

import string
import os

os.chdir('mydirectory') # Changes directory.

speech = open("obamaspeech.txt", "r") # Opens file.
  
emptyDict = dict() # Creates dictionary

for line in speech:
    line = line.strip() # Removes leading spaces.
    line = line.lower() # Convert to lowercase.
    line = line.translate(line.maketrans("", "", string.punctuation)) # Removes punctuation.
    words = line.split(" ") # Splits lines into words. 
    for word in words:
        if len(word) >= 5 in emptyDict: 
            emptyDict[word] = emptyDict[word] + 1
        else:
            emptyDict[word] = 1
  
for key in list(emptyDict.keys()):
    print(key, ":", emptyDict[key])

我认为您需要单独测试字长:

for word in words:
    if len(word) >= 5:
        if word in emptyDict: 
            emptyDict[word] = emptyDict[word] + 1
        else:
            emptyDict[word] = 1

另一个答案向您展示了如何修改您的代码以达到预期的效果。另一方面,这是另一种实现方式。请注意,在列表理解和集合模块中的 Counter 对象的帮助下,计算单词并按频率对其进行排序变得更加容易。

from collections import Counter 

os.chdir('mydirectory')
with open("obamaspeech.txt", "r") as speech:
    full_speech = speech.read().lower().translate(str.maketrans("", "", string.punctuation))

words = full_speech.split()
count = Counter([w for w in words if len(w)>=5])
for w,k in count.most_common():
    print(f"{w}: {k} time(s)")