我正在尝试计算 txt 文件中的所有字母,然后按降序显示
I'm trying to count all letters in a txt file then display in descending order
正如标题所说:
到目前为止,我的代码确实有效,但我无法按顺序显示信息。目前只是随机显示信息。
def frequencies(filename):
infile=open(filename, 'r')
wordcount={}
content = infile.read()
infile.close()
counter = {}
invalid = "‘'`,.?!:;-_\n—' '"
for word in content:
word = content.lower()
for letter in word:
if letter not in invalid:
if letter not in counter:
counter[letter] = content.count(letter)
print('{:8} appears {} times.'.format(letter, counter[letter]))
如有任何帮助,我们将不胜感激。
字典是无序的数据结构。此外,如果你想计算一组数据中的某些项目,你最好使用 collections.Counter()
,它为此目标进行了更优化和 pythonic。
然后您可以只使用 Counter.most_common(N)
来打印 Counter 对象中的大多数 N
常见项目。
关于文件的打开,您可以简单地使用 with
语句在块的末尾自动关闭文件。最好不要在你的函数中打印最终结果,你可以通过生成预期的行然后在你想要的时候打印它们来使你的函数成为一个生成器。
from collections import Counter
def frequencies(filename, top_n):
with open(filename) as infile:
content = infile.read()
invalid = "‘'`,.?!:;-_\n—' '"
counter = Counter(filter(lambda x: not invalid.__contains__(x), content))
for letter, count in counter.most_common(top_n):
yield '{:8} appears {} times.'.format(letter, count)
然后使用 for 循环遍历生成器函数:
for line in frequencies(filename, 100):
print(line)
您无需遍历 'words',然后遍历其中的字母。当您遍历一个字符串(如 content
)时,您将已经有了单个字符(长度为 1 的字符串)。然后,您可能希望等到计数循环结束后再显示输出。计数后,您可以手动排序:
for letter, count in sorted(counter.items(), key=lambda x: x[1], reverse=True):
# do stuff
不过,最好使用 collections.Counter
:
from collections import Counter
content = filter(lambda x: x not in invalid, content)
c = Counter(content)
for letter, count in c.most_common(): # descending order of counts
print('{:8} appears {} times.'.format(letter, number))
# for letter, number in c.most_common(n): # limit to n most
# print('{:8} appears {} times.'.format(letter, count))
您可以在打印时对字典进行排序,使用 sorted
方法:
lettercount = {}
invalid = "‘'`,.?!:;-_\n—' '"
infile = open('text.file')
for c in infile.read().lower():
if c not in invalid:
lettercount[c] = lettercount.setdefault(c,0) + 1
for letter in sorted(lettercount):
print("{} appears {} times".format(letter,lettercount[letter]))
rmq: 我用了setdefault
change 方法,第一次遇到字母的时候设置默认值为0
降序显示需要在您的搜索循环之外,否则它们将在遇到时显示。
使用内置的 sorted
降序排序非常简单(您需要设置 reverse
参数!)
但是 python 包含电池 并且已经有一个 Counter
。所以它可以像这样简单:
from collections import Counter
from operator import itemgetter
def frequencies(filename):
# Sets are especially optimized for fast lookups so this will be
# a perfect fit for the invalid characters.
invalid = set("‘'`,.?!:;-_\n—' '")
# Using open in a with block makes sure the file is closed afterwards.
with open(filename, 'r') as infile:
# The "char for char ...." is a conditional generator expression
# that feeds all characters to the counter that are not invalid.
counter = Counter(char for char in infile.read().lower() if char not in invalid)
# If you want to display the values:
for char, charcount in sorted(counter.items(), key=itemgetter(1), reverse=True):
print(char, charcount)
Counter 已经有一个 most_common
方法,但您想要显示所有字符和计数,因此它不适合这种情况。但是,如果您只想知道 x 个最常见的计数,那么它就适合了。
正如标题所说:
到目前为止,我的代码确实有效,但我无法按顺序显示信息。目前只是随机显示信息。
def frequencies(filename):
infile=open(filename, 'r')
wordcount={}
content = infile.read()
infile.close()
counter = {}
invalid = "‘'`,.?!:;-_\n—' '"
for word in content:
word = content.lower()
for letter in word:
if letter not in invalid:
if letter not in counter:
counter[letter] = content.count(letter)
print('{:8} appears {} times.'.format(letter, counter[letter]))
如有任何帮助,我们将不胜感激。
字典是无序的数据结构。此外,如果你想计算一组数据中的某些项目,你最好使用 collections.Counter()
,它为此目标进行了更优化和 pythonic。
然后您可以只使用 Counter.most_common(N)
来打印 Counter 对象中的大多数 N
常见项目。
关于文件的打开,您可以简单地使用 with
语句在块的末尾自动关闭文件。最好不要在你的函数中打印最终结果,你可以通过生成预期的行然后在你想要的时候打印它们来使你的函数成为一个生成器。
from collections import Counter
def frequencies(filename, top_n):
with open(filename) as infile:
content = infile.read()
invalid = "‘'`,.?!:;-_\n—' '"
counter = Counter(filter(lambda x: not invalid.__contains__(x), content))
for letter, count in counter.most_common(top_n):
yield '{:8} appears {} times.'.format(letter, count)
然后使用 for 循环遍历生成器函数:
for line in frequencies(filename, 100):
print(line)
您无需遍历 'words',然后遍历其中的字母。当您遍历一个字符串(如 content
)时,您将已经有了单个字符(长度为 1 的字符串)。然后,您可能希望等到计数循环结束后再显示输出。计数后,您可以手动排序:
for letter, count in sorted(counter.items(), key=lambda x: x[1], reverse=True):
# do stuff
不过,最好使用 collections.Counter
:
from collections import Counter
content = filter(lambda x: x not in invalid, content)
c = Counter(content)
for letter, count in c.most_common(): # descending order of counts
print('{:8} appears {} times.'.format(letter, number))
# for letter, number in c.most_common(n): # limit to n most
# print('{:8} appears {} times.'.format(letter, count))
您可以在打印时对字典进行排序,使用 sorted
方法:
lettercount = {}
invalid = "‘'`,.?!:;-_\n—' '"
infile = open('text.file')
for c in infile.read().lower():
if c not in invalid:
lettercount[c] = lettercount.setdefault(c,0) + 1
for letter in sorted(lettercount):
print("{} appears {} times".format(letter,lettercount[letter]))
rmq: 我用了setdefault
change 方法,第一次遇到字母的时候设置默认值为0
降序显示需要在您的搜索循环之外,否则它们将在遇到时显示。
使用内置的 sorted
降序排序非常简单(您需要设置 reverse
参数!)
但是 python 包含电池 并且已经有一个 Counter
。所以它可以像这样简单:
from collections import Counter
from operator import itemgetter
def frequencies(filename):
# Sets are especially optimized for fast lookups so this will be
# a perfect fit for the invalid characters.
invalid = set("‘'`,.?!:;-_\n—' '")
# Using open in a with block makes sure the file is closed afterwards.
with open(filename, 'r') as infile:
# The "char for char ...." is a conditional generator expression
# that feeds all characters to the counter that are not invalid.
counter = Counter(char for char in infile.read().lower() if char not in invalid)
# If you want to display the values:
for char, charcount in sorted(counter.items(), key=itemgetter(1), reverse=True):
print(char, charcount)
Counter 已经有一个 most_common
方法,但您想要显示所有字符和计数,因此它不适合这种情况。但是,如果您只想知道 x 个最常见的计数,那么它就适合了。