仅返回大于等于 5 个字符的字数并按键值排序(从高到低)
Only returning word counts for words >= 5 characters & sort by key value (highest to lowest)
我有一个 .txt 文件,我正在寻找 return 每次单词出现在其中的次数。我得到了可以工作的代码,但现在我想细化到只有 returning 长度为 5 个或更多字符的单词。我在 for 语句中添加了“len”函数,但它仍然是 returning 所有单词。任何帮助将不胜感激。
我也想知道是否可以按键数排序,return 键数最高的词排在最前面。
import string
import os
os.chdir('mydirectory') # Changes directory.
speech = open("obamaspeech.txt", "r") # Opens file.
emptyDict = dict() # Creates dictionary
for line in speech:
line = line.strip() # Removes leading spaces.
line = line.lower() # Convert to lowercase.
line = line.translate(line.maketrans("", "", string.punctuation)) # Removes punctuation.
words = line.split(" ") # Splits lines into words.
for word in words:
if len(word) >= 5 in emptyDict:
emptyDict[word] = emptyDict[word] + 1
else:
emptyDict[word] = 1
for key in list(emptyDict.keys()):
print(key, ":", emptyDict[key])
我认为您需要单独测试字长:
for word in words:
if len(word) >= 5:
if word in emptyDict:
emptyDict[word] = emptyDict[word] + 1
else:
emptyDict[word] = 1
另一个答案向您展示了如何修改您的代码以达到预期的效果。另一方面,这是另一种实现方式。请注意,在列表理解和集合模块中的 Counter 对象的帮助下,计算单词并按频率对其进行排序变得更加容易。
from collections import Counter
os.chdir('mydirectory')
with open("obamaspeech.txt", "r") as speech:
full_speech = speech.read().lower().translate(str.maketrans("", "", string.punctuation))
words = full_speech.split()
count = Counter([w for w in words if len(w)>=5])
for w,k in count.most_common():
print(f"{w}: {k} time(s)")
我有一个 .txt 文件,我正在寻找 return 每次单词出现在其中的次数。我得到了可以工作的代码,但现在我想细化到只有 returning 长度为 5 个或更多字符的单词。我在 for 语句中添加了“len”函数,但它仍然是 returning 所有单词。任何帮助将不胜感激。
我也想知道是否可以按键数排序,return 键数最高的词排在最前面。
import string
import os
os.chdir('mydirectory') # Changes directory.
speech = open("obamaspeech.txt", "r") # Opens file.
emptyDict = dict() # Creates dictionary
for line in speech:
line = line.strip() # Removes leading spaces.
line = line.lower() # Convert to lowercase.
line = line.translate(line.maketrans("", "", string.punctuation)) # Removes punctuation.
words = line.split(" ") # Splits lines into words.
for word in words:
if len(word) >= 5 in emptyDict:
emptyDict[word] = emptyDict[word] + 1
else:
emptyDict[word] = 1
for key in list(emptyDict.keys()):
print(key, ":", emptyDict[key])
我认为您需要单独测试字长:
for word in words:
if len(word) >= 5:
if word in emptyDict:
emptyDict[word] = emptyDict[word] + 1
else:
emptyDict[word] = 1
另一个答案向您展示了如何修改您的代码以达到预期的效果。另一方面,这是另一种实现方式。请注意,在列表理解和集合模块中的 Counter 对象的帮助下,计算单词并按频率对其进行排序变得更加容易。
from collections import Counter
os.chdir('mydirectory')
with open("obamaspeech.txt", "r") as speech:
full_speech = speech.read().lower().translate(str.maketrans("", "", string.punctuation))
words = full_speech.split()
count = Counter([w for w in words if len(w)>=5])
for w,k in count.most_common():
print(f"{w}: {k} time(s)")