目标变量中类的数量是否会影响分类模型的准确性？

Does the number of classes in the target variable affect the accuracy of a classification model?

根据我自己的经验，我注意到 class化模型的准确度与目标变量中 classes 的数量成反比。也就是说，因变量中 class 的数量越多，模型的准确性就越低。我不知道这种变化是由 classes 的数量还是它们之间的不平衡引起的（尽管过采样技术确实有助于稍微提高模型的性能）。我假设因为更多的 classes 导致它们之间的概率差异较小，因此模型更难“自信地”确定确切的 class.

是否有更具体的理论基础来解释上述观察结果？

最简单的方法是了解准确性“意味着”什么。类的数量通过考虑随机基线。抛硬币给你 1/K 的准确性，其中 K 是类的数量。所以 2 类为 50%，但 10 为 10%，100 为 1%。这表明如果你有更多类，则“60%”准确度“意味着更多”，一个二元分类器60% 的准确率几乎是随机的，但是 100 类的 60% 是神一般的。

目标变量中类的数量是否会影响分类模型的准确性？

Does the number of classes in the target variable affect the accuracy of a classification model?

statistics

classification

machine-learning

probability

multiclass-classification

目标变量中 类 的数量是否会影响分类模型的准确性？

Does the number of classes in the target variable affect the accuracy of a classification model?

statistics

classification

machine-learning

probability

multiclass-classification

目标变量中类的数量是否会影响分类模型的准确性？