为两个数组中的每个唯一项获取最常见的匹配项

Get most common match for each unique item in two arrays

我有类似于这两个数组的数据:

predicted_class = ['A','B','C','A','B','A','B','C','A']
true_class_____ = ['A','B','C','A','B','C','A','B','C']

我想找到在大多数人达成共识后正确预测的 classes 的数量 - 例如,我的数据显示 'A' = 66% 正确的预测,'B' = 66% 正确,'C' = 33% 正确,因此考虑到 class 'A' 和 'B' 的最常见预测是正确的,因此总体准确度为 66%,但是 'C' 不是。

根据您在示例和评论中所写的内容,您似乎正在寻找每个 class.

的 correct-to-all 预测比率的最大值

这是使用 collections.Counter 的一种方法:

import collections


def max_model_match(true, predicted):
    # count all occurrences of the classes
    counter_all = collections.Counter(true)
    # initialize the "correct" or "good" counter
    counter_good = counter_all.copy()
    counter_good.clear()
    # loop through all outcomes
    for (x, y) in zip(true, predicted):
        # if the prediction is correct increment the counter
        if x == y:
            counter_good[x] += 1
    # find the maximum correct-to-all ratio
    max_good_ratio = 0.0
    for key in counter_all.keys():
        good_ratio = counter_good[key] / counter_all[key]
        if good_ratio > max_good_ratio:
             max_good_ratio = good_ratio
    return max_good_ratio


predicted_class = ['A','B','C','A','B','A','B','C','A']
true_class      = ['A','B','C','A','B','C','A','B','C']
max_model_match(true_class, predicted_class)
# 0.6666666666666666

使用 defaultdictmax 的简单方法:

predicted_class = ['A','B','C','A','B','A','B','C','A']
true_class      = ['A','B','C','A','B','C','A','B','C']

from collections import defaultdict
d = defaultdict(lambda : [0, 0]) # [total, correct]
for p,t in zip(predicted_class, true_class):
    d[t][0] += 1
    if p == t:
        d[t][1] += 1

# max value
max(n/t for t,n in d.values())

输出:0.666666666