计算 sklearn.metrics.ndcg_score 时出错

Question

我正在尝试计算分类器的 ndcg 分数，但出现此错误：

ValueError: Only ('multilabel-indicator', 'continuous-multioutput', 'multiclass-multioutput') formats are supported. Got multiclass instead

这是我的代码：

# Declare classifier, fit on data and make predictions
from sklearn.ensemble import RandomForestClassifier
rnd_forest = RandomForestClassifier()
rnd_forest.fit(X_train_tr, y_train)
y_pred_prob = rnd_forest.predict_proba(X_train_tr)

# Calculate ndcg score
from sklearn.metrics import ndcg_score
# This is where I get an error
ndcg_score(y_train, y_pred_prob, k=5)

这是我的目标和预测概率：

# True labels of the first two samples
y_train[:2]
> array([7, 7])
    
# Predicted probabilities for first two observation
y_pred_prob[:2]
> array([[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.]])

我试图将 y_train 重塑为二维数组，但它不起作用。谁能告诉我如何解决这个错误？

Answer 1

假设您在 y_train 中有 N 个观测值。您必须将 y_train 转换为 N 行和 12 列的矩阵。

# Create an ndarray of size (N, 12) filled with zeros
y_train_matrix = np.zeros(shape=(y_pred_prob.shape[0], y_pred_prob.shape[1]))
# Write a 1 on each row's corresponding category
y_train_matrix[np.arange(y_pred_prob.shape[0]), y_train] = 1
# You now have this ndarray
y_train_matrix

array([[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.]])

现在可以计算分数了：

ndcg_score(y_train_matrix, y_pred_prob)

1.0

计算 sklearn.metrics.ndcg_score 时出错

Getting error when calculating sklearn.metrics.ndcg_score

python

numpy

reshape

scikit-learn