对计数器中的键和值进行排序
Sorting the keys and values in counter
这是正在处理的代码,我希望输出作为降序计数,如果计数相同,则按名称排序。
from collections import Counter
import re
from nltk.corpus import stopwords
import operator
text = "The quick brown fox jumped over the lazy dogs bowl. The dog was angry with the fox considering him lazy."
def tokenize(text):
tokens = re.findall(r"\w+|\S", text.lower())
#print(tokens)
tokens1 = []
for i in tokens:
x = re.findall(r"\w+|\S", i, re.ASCII)
for j in x:
tokens1.append(j)
return tokens
tok = tokenize(text)
punctuations = ['(',')',';',':','[',']',',', '...', '.', '&']
keywords = [word for word in tok if not word in punctuations]
cnt = Counter()
d= {}
for word in keywords:
cnt[word] += 1
print(cnt)
freq = operator.itemgetter(1)
for k, v in sorted(cnt.items(), reverse=True, key=freq):
print("%3d %s" % (v, k))
当前输出:
4 the
2 fox
2 lazy
1 quick
1 brown
1 jumped
1 over
1 dogs
1 bowl
1 dog
1 was
1 angry
1 with
1 considering
1 him
所需输出:
4 the
2 fox
2 lazy
1 angry
1 bowl
1 brown
1 considering
1 dog
1 dogs
等等
使用 returns 元组的排序函数。元组中的第一项是计数的倒数(字典中的值),第二项是字符串(字典中的键)。您可以通过删除变量 freq
,删除对 sorted 的调用中的关键字 reverse
,并为每个项目提供一个小的 lambda 函数来实现 returns (-value, key)词典。程序的最后几行是:
print(cnt)
for k, v in sorted(cnt.items(), key=lambda item: (-item[1], item[0])):
print("%3d %s" % (v, k))
lambda 函数中 - 符号的原因是为了获得正确的排序顺序,因为默认排序顺序是从低到高。
这是正在处理的代码,我希望输出作为降序计数,如果计数相同,则按名称排序。
from collections import Counter
import re
from nltk.corpus import stopwords
import operator
text = "The quick brown fox jumped over the lazy dogs bowl. The dog was angry with the fox considering him lazy."
def tokenize(text):
tokens = re.findall(r"\w+|\S", text.lower())
#print(tokens)
tokens1 = []
for i in tokens:
x = re.findall(r"\w+|\S", i, re.ASCII)
for j in x:
tokens1.append(j)
return tokens
tok = tokenize(text)
punctuations = ['(',')',';',':','[',']',',', '...', '.', '&']
keywords = [word for word in tok if not word in punctuations]
cnt = Counter()
d= {}
for word in keywords:
cnt[word] += 1
print(cnt)
freq = operator.itemgetter(1)
for k, v in sorted(cnt.items(), reverse=True, key=freq):
print("%3d %s" % (v, k))
当前输出:
4 the
2 fox
2 lazy
1 quick
1 brown
1 jumped
1 over
1 dogs
1 bowl
1 dog
1 was
1 angry
1 with
1 considering
1 him
所需输出:
4 the
2 fox
2 lazy
1 angry
1 bowl
1 brown
1 considering
1 dog
1 dogs
等等
使用 returns 元组的排序函数。元组中的第一项是计数的倒数(字典中的值),第二项是字符串(字典中的键)。您可以通过删除变量 freq
,删除对 sorted 的调用中的关键字 reverse
,并为每个项目提供一个小的 lambda 函数来实现 returns (-value, key)词典。程序的最后几行是:
print(cnt)
for k, v in sorted(cnt.items(), key=lambda item: (-item[1], item[0])):
print("%3d %s" % (v, k))
lambda 函数中 - 符号的原因是为了获得正确的排序顺序,因为默认排序顺序是从低到高。