PHP TNTClassifier似然概率分布

PHP TNTClassifier likelihood probability distribution

我正在使用 TNT 搜索文本分类模块, https://github.com/teamtnt/tntsearch, and it works good, the problem is I do not know how to interpret the results - more specifically the likelihood of correct match. I have read that it uses Naive Bayes classifier 但我找不到结果是哪种概率分布。我有自己的大约 50 个值(50 / 10 = 5 个类别)的小型测试数据集,猜测相当正确。

但是,此工具提供的似然数是一个负数,大约在 -15 到 -25 的范围内。

问题是,什么值可以被解释为不可信?假设该工具只有 <33% 的把握。这个假设对应什么值?

我已与 TNTSearch 开发人员取得联系。 classifier 实际上不是 return 概率而是 "highest score"。并且只为最佳匹配。

根据提示,我对代码做了一些修改。

在 class TeamTNT\TNTSearch\Classifier\TNTClassifier 中,我更改了 predict 方法中的位(受 here 启发的 softmax 函数):

public function predict($statement)
{
    $words = $this->tokenizer->tokenize($statement);

    $best_likelihoods = [];
    $best_likelihood = -INF;
    $best_type       = '';
    foreach ($this->types as $type) {
        $best_likelihoods[$type] = -INF;
        $likelihood = log($this->pTotal($type)); // calculate P(Type)
        $p          = 0;
        foreach ($words as $word) {
            $word = $this->stemmer->stem($word);
            $p += log($this->p($word, $type));
        }
        $likelihood += $p; // calculate P(word, Type)
        if ($likelihood > $best_likelihood) {
            $best_likelihood = $likelihood;
            $best_likelihoods[$type] = $likelihood;
            $best_type       = $type;
        }
    }

    return [
        'likelihood' => $best_likelihood,
        'likelihoods' => $best_likelihoods,
        'probability' => $this->softmax($best_likelihoods),
        'label'      => $best_type
    ];
}

然后可以在$guess['probability']['$label']中找到百分比概率。