如何使用 "BigramCollocationFinder" 查找 "Bigrams"？

Question

我正在研究使用 python 的编译器构造，我正在尝试创建文本中所有小写单词的列表，然后生成 BigramCollocationFinder，我们可以用它来查找二元语法，这是成对的单词。

这些二元组是使用 nltk.metrics 包中的关联测量函数找到的。

我从 "Python 3 Text Processing with NLTK 3 Cookbook" 开始练习，我找到了这个示例代码：

from nltk.corpus import webtext
from nltk.collocations import BigramCollocationFinder
from nltk.metrics import BigramAssocMeasures
words = [w.lower() for w in webtext.words('grail.txt')]
bcf = BigramCollocationFinder.from_words(words)
bcf.nbest(BigramAssocMeasures.likelihood_ratio, 4)

我被困在：

bcf.nbest(BigramAssocMeasures.likelihood_ratio, 4)
likelihood_ratio, 4

这里指的是相似度或者在这段代码中是什么意思

非常感谢有关此事的任何指导。

Answer 1

我相信 NLTK collocations for specific words 应该可以回答您的问题。它首先计算 PMI，然后 returns 计算语料库中出现频率最高的前 4 个词。

如何使用 "BigramCollocationFinder" 查找 "Bigrams"？

How do I use "BigramCollocationFinder" to find "Bigrams"?

cookbook

nltk

python-3.x