如何在训练期间从 Scikit Learn SVM 中的每个 class 中抽取相同数量的示例？

How to draw equal number of examples from each class in Scikit Learn SVM during training?

我使用 Scikit Learn 实现了一个 Support Vector Machine。由于我正在处理 class 不平衡（96% 到 4%），我希望 SVM 在训练期间从每个 class 中抽取相同数量的样本。我如何使用 Scikit Learn 实现这一目标？

您可能对 imbalanced-learn 包感兴趣，它有许多实现，例如过采样和欠采样来解决 class 不平衡问题。

另一种方法是使用 class_weight='balanced' 参数调整 class 权重；来自 SVC docs（其他 SVM 模型也存在类似的论点）：

class_weight : {dict, ‘balanced’}, optional

Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))