访问包含 ngram 的计数器的元素
accessing elements of a counter containing ngrams
我正在获取一个字符串,对其进行标记化,并想查看最常见的二元语法,这是我得到的:
import nltk
import collections
from nltk import ngrams
someString="this is some text. this is some more test. this is even more text."
tokens=nltk.word_tokenize(someString)
tokens=[token.lower() for token in tokens if len()>1]
bigram=ngrams(tokens,2)
aCounter=collections.Counter(bigram)
如果我:
print(aCounter)
然后它将按排序顺序输出二元组。
for element in aCounter:
print(element)
将打印元素,但不带计数,也不按计数顺序。我想做一个 for 循环,在其中打印出文本中 X 个最常见的双字母组。
我实际上是在尝试同时学习 Python 和 nltk,所以这可能就是我在这里苦苦挣扎的原因(我认为这是一件微不足道的事情)。
您可能正在寻找已经存在的东西,即计数器上的 most_common
方法。来自文档:
Return a list of the n
most common elements and their counts from the most common to the least. If n
is omitted or None
, most_common()
returns all elements in the counter. Elements with equal counts are ordered arbitrarily:
您可以调用它并提供一个值 n
以获得 n
最常见的值计数对。例如:
from collections import Counter
# initialize with silly value.
c = Counter('aabbbccccdddeeeeefffffffghhhhiiiiiii')
# Print 4 most common values and their respective count.
for val, count in c.most_common(4):
print("Value {0} -> Count {1}".format(val, count))
打印出:
Value f -> Count 7
Value i -> Count 7
Value e -> Count 5
Value h -> Count 4
我正在获取一个字符串,对其进行标记化,并想查看最常见的二元语法,这是我得到的:
import nltk
import collections
from nltk import ngrams
someString="this is some text. this is some more test. this is even more text."
tokens=nltk.word_tokenize(someString)
tokens=[token.lower() for token in tokens if len()>1]
bigram=ngrams(tokens,2)
aCounter=collections.Counter(bigram)
如果我:
print(aCounter)
然后它将按排序顺序输出二元组。
for element in aCounter:
print(element)
将打印元素,但不带计数,也不按计数顺序。我想做一个 for 循环,在其中打印出文本中 X 个最常见的双字母组。
我实际上是在尝试同时学习 Python 和 nltk,所以这可能就是我在这里苦苦挣扎的原因(我认为这是一件微不足道的事情)。
您可能正在寻找已经存在的东西,即计数器上的 most_common
方法。来自文档:
Return a list of the
n
most common elements and their counts from the most common to the least. Ifn
is omitted orNone
,most_common()
returns all elements in the counter. Elements with equal counts are ordered arbitrarily:
您可以调用它并提供一个值 n
以获得 n
最常见的值计数对。例如:
from collections import Counter
# initialize with silly value.
c = Counter('aabbbccccdddeeeeefffffffghhhhiiiiiii')
# Print 4 most common values and their respective count.
for val, count in c.most_common(4):
print("Value {0} -> Count {1}".format(val, count))
打印出:
Value f -> Count 7
Value i -> Count 7
Value e -> Count 5
Value h -> Count 4