如何将估算器传递给 NLTK 的 NgramModel？

Question

我正在使用 NLTK 来训练使用拉普拉斯估计器的二元模型。 NgramModel 的构造函数是：

def __init__(self, n, train, pad_left=True, pad_right=False,
             estimator=None, *estimator_args, **estimator_kwargs):

经过一些研究，我发现一个有效的语法如下：

bigram_model = NgramModel(2, my_corpus, True, False, lambda f, b:LaplaceProbDist(f))

虽然它似乎工作正常，但我对最后两个参数感到困惑。主要是，为什么 'estimator' 参数是 lambda 函数以及如何与 LaplaceProbDist 交互？

Answer 1

目前，您可以使用 lambda 函数 return 来自分布的 Freqdist，例如

from nltk.model import NgramModel
from nltk.corpus import brown
from nltk.probability import LaplaceProbDist

est = lambda fdist: LaplaceProbDist(fdist)

corpus = brown.words(categories='news')[:100]
lm = NgramModel(3, corpus, estimator=est)


print lm
print (corpus[8], corpus[9], corpus[12] )
print (lm.prob(corpus[12], [corpus[8], corpus[9]]) )
print

[输出]:

<NgramModel with 100 3-grams>
(u'investigation', u'of', u'primary')
0.0186667723526

但请注意，NLTK 中包含 LanguageModel 对象的 model 包是 "under-construction"，因此当稳定版本出现时，上述代码可能无法正常工作。

要及时了解与 model 软件包相关的问题，请定期检查这些问题：

#792
#800

如何将估算器传递给 NLTK 的 NgramModel？

How to pass in an estimator to NLTK's NgramModel?

python

nlp

linguistics

nltk

n-gram