Python 来自 txt 文件的字数统计程序
Python word count program from txt file
我正在尝试编写一个程序来计算 txt 文件中最常见的 5 个单词。
这是我目前的情况:
file = open('alice.txt')
wordcount = {}
for word in file.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
for k, v in wordcount.items():
print (k, v)
程序按原样计算 .txt 文件中的每个单词。
我的问题是如何让它只计算文件中最常见的 5 个单词,以便它显示单词和每个单词旁边的单词计数。
有一个问题 - 我不会使用字典...不管那是什么意思。
有一个内置函数可以按键对字典进行排序:
sorted(wordcount, reverse=True)
现在由您来决定如何只 get/print 前五个元素 ;)
注意:当然sorted也可以对其他集合进行排序。
很简单,您只需找到文件中最常见的 5 个词。
所以你可以这样做:
wordcount = sorted(wordcount.items(), key=lambda x: x[1], reverse=True)
然后,这个字典将按值排序(记住 sorted
return 一个列表)。
您可以使用以下代码获取最常用的 5 个单词:
for k, v in wordcount[:5]):
print (k, v)
完整代码如下:
wordcount = {}
with open('alice.txt') as file: # with can auto close the file
for word in file.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
wordcount = sorted(wordcount.items(), key=lambda x: x[1], reverse=True)
for k, v in wordcount[:5]:
print(k, v)
此外,这里有一个更简单的方法,使用 collections.Counter
:
from collections import Counter
with open('alice.txt') as file: # with can auto close the file
wordcount = Counter(file.read().split())
for k, v in wordcount.most_common(5):
print(k, v)
输出与第一个解决方案相同。
File_Name = 'file.txt'
counterDict = {}
with open(File_Name,'r') as fh:
#Reading all lines into a list.
data = fh.readlines()
for line in data:
# Removing some characters like '.' , ','
# Changing all case into lower.
line = line.lower().replace(',','').replace('.','')
# Splitting all words into list elements.
words = line.split()
for word in words:
# Add the word into counterDict if it is not present.
# key should be 1.
if word not in counterDict:
counterDict[word] = 1
#If the word is already in the counterDict, then increase its count by one.
else:
counterDict[word] = counterDict[word] + 1
# The sorting will be based on word count.
# Eg : lambda x = (word,count) = x[0] = word , x[1]=count
sorted_counterDict = sorted(counterDict.items(), reverse=True , key=lambda x : x[1])
#sorted_counterDict[0:5] , print first five.
for key,val in sorted_counterDict[0:5]:
print(key,val)
我正在尝试编写一个程序来计算 txt 文件中最常见的 5 个单词。
这是我目前的情况:
file = open('alice.txt')
wordcount = {}
for word in file.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
for k, v in wordcount.items():
print (k, v)
程序按原样计算 .txt 文件中的每个单词。
我的问题是如何让它只计算文件中最常见的 5 个单词,以便它显示单词和每个单词旁边的单词计数。
有一个问题 - 我不会使用字典...不管那是什么意思。
有一个内置函数可以按键对字典进行排序:
sorted(wordcount, reverse=True)
现在由您来决定如何只 get/print 前五个元素 ;)
注意:当然sorted也可以对其他集合进行排序。
很简单,您只需找到文件中最常见的 5 个词。
所以你可以这样做:
wordcount = sorted(wordcount.items(), key=lambda x: x[1], reverse=True)
然后,这个字典将按值排序(记住 sorted
return 一个列表)。
您可以使用以下代码获取最常用的 5 个单词:
for k, v in wordcount[:5]):
print (k, v)
完整代码如下:
wordcount = {}
with open('alice.txt') as file: # with can auto close the file
for word in file.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
wordcount = sorted(wordcount.items(), key=lambda x: x[1], reverse=True)
for k, v in wordcount[:5]:
print(k, v)
此外,这里有一个更简单的方法,使用 collections.Counter
:
from collections import Counter
with open('alice.txt') as file: # with can auto close the file
wordcount = Counter(file.read().split())
for k, v in wordcount.most_common(5):
print(k, v)
输出与第一个解决方案相同。
File_Name = 'file.txt'
counterDict = {}
with open(File_Name,'r') as fh:
#Reading all lines into a list.
data = fh.readlines()
for line in data:
# Removing some characters like '.' , ','
# Changing all case into lower.
line = line.lower().replace(',','').replace('.','')
# Splitting all words into list elements.
words = line.split()
for word in words:
# Add the word into counterDict if it is not present.
# key should be 1.
if word not in counterDict:
counterDict[word] = 1
#If the word is already in the counterDict, then increase its count by one.
else:
counterDict[word] = counterDict[word] + 1
# The sorting will be based on word count.
# Eg : lambda x = (word,count) = x[0] = word , x[1]=count
sorted_counterDict = sorted(counterDict.items(), reverse=True , key=lambda x : x[1])
#sorted_counterDict[0:5] , print first five.
for key,val in sorted_counterDict[0:5]:
print(key,val)