为两个数组中的每个唯一项获取最常见的匹配项

Question

我有类似于这两个数组的数据：

predicted_class = ['A','B','C','A','B','A','B','C','A']
true_class_____ = ['A','B','C','A','B','C','A','B','C']

我想找到在大多数人达成共识后正确预测的 classes 的数量 - 例如，我的数据显示 'A' = 66% 正确的预测，'B' = 66% 正确，'C' = 33% 正确，因此考虑到 class 'A' 和 'B' 的最常见预测是正确的，因此总体准确度为 66%，但是 'C' 不是。

Answer 1

根据您在示例和评论中所写的内容，您似乎正在寻找每个 class.

的 correct-to-all 预测比率的最大值

这是使用 collections.Counter 的一种方法：

import collections


def max_model_match(true, predicted):
    # count all occurrences of the classes
    counter_all = collections.Counter(true)
    # initialize the "correct" or "good" counter
    counter_good = counter_all.copy()
    counter_good.clear()
    # loop through all outcomes
    for (x, y) in zip(true, predicted):
        # if the prediction is correct increment the counter
        if x == y:
            counter_good[x] += 1
    # find the maximum correct-to-all ratio
    max_good_ratio = 0.0
    for key in counter_all.keys():
        good_ratio = counter_good[key] / counter_all[key]
        if good_ratio > max_good_ratio:
             max_good_ratio = good_ratio
    return max_good_ratio


predicted_class = ['A','B','C','A','B','A','B','C','A']
true_class      = ['A','B','C','A','B','C','A','B','C']
max_model_match(true_class, predicted_class)
# 0.6666666666666666

Answer 2

使用 defaultdict 和 max 的简单方法：

predicted_class = ['A','B','C','A','B','A','B','C','A']
true_class      = ['A','B','C','A','B','C','A','B','C']

from collections import defaultdict
d = defaultdict(lambda : [0, 0]) # [total, correct]
for p,t in zip(predicted_class, true_class):
    d[t][0] += 1
    if p == t:
        d[t][1] += 1

# max value
max(n/t for t,n in d.values())

输出：0.666666666

为两个数组中的每个唯一项获取最常见的匹配项

Get most common match for each unique item in two arrays

python

arrays

numpy