Python NLTK 朴素贝叶斯分类器：该分类器用于对输入进行分类的基础计算是什么？

Question

我使用 Python NLTK 中的朴素贝叶斯 class 运算符来计算以下示例的概率分布：

import nltk

def main():
    train = [(dict(feature=1), 'class_x'), (dict(feature=0), 'class_x'),   (dict(feature=0), 'class_y'), (dict(feature=0), 'class_y')]

    test = [dict(feature=1)]

    classifier = nltk.classify.NaiveBayesClassifier.train(train)

    print("classes available: ", sorted(classifier.labels()))

    print ("input assigned to: ", classifier.classify_many(test))

    for pdist in classifier.prob_classify_many(test):
        print ("probability distribution: ")
        print ('%.4f %.4f' % (pdist.prob('class_x'), pdist.prob('class_y')))

if __name__ == '__main__':
    main()

训练数据集中有两个 classes（class_x 和 class_y）。每个 classes 都有两个输入。对于 class_x，第一个输入特征的值为 1，第二个输入特征的值为 0。对于 class_y，两个输入特征的值为 0。测试数据集由一个输入组成, 值为 1.

当我运行代码时，输出是：

classes available:  ['class_x', 'class_y']
input assigned to:  ['class_x']
0.7500 0.2500

要获得每个 class 的概率或可能性，classifier 应该将 class 的先验（在本例中为 0.5）乘以概率class 中的每个功能。应考虑平滑。

我通常使用与此类似的公式（或类似的变体）：

P(feature|class) = class 的先验 * class 中特征的频率 +1 / [ 中的总特征=41=] + 词汇量。平滑可能会有所不同并稍微改变结果。

在上面的示例代码中，classifier 究竟是如何计算概率分布的？使用的公式是什么？

我检查了 here and here，但无法获得关于计算是如何完成的任何信息。

提前致谢。

Answer 1

来自源代码

https://github.com/nltk/nltk/blob/develop/nltk/classify/naivebayes.py#L9yo

|                       P(label) * P(features|label)
|  P(label|features) = ------------------------------
|                              P(features)

Python NLTK 朴素贝叶斯分类器：该分类器用于对输入进行分类的基础计算是什么？

Python NLTK Naive Bayes Classifier: What is the underlying computation that this classifier uses to classifiy input?

python

machine-learning

nltk