计算 sklearn.metrics.ndcg_score 时出错
Getting error when calculating sklearn.metrics.ndcg_score
我正在尝试计算分类器的 ndcg 分数,但出现此错误:
ValueError: Only ('multilabel-indicator', 'continuous-multioutput', 'multiclass-multioutput') formats are supported. Got multiclass instead
这是我的代码:
# Declare classifier, fit on data and make predictions
from sklearn.ensemble import RandomForestClassifier
rnd_forest = RandomForestClassifier()
rnd_forest.fit(X_train_tr, y_train)
y_pred_prob = rnd_forest.predict_proba(X_train_tr)
# Calculate ndcg score
from sklearn.metrics import ndcg_score
# This is where I get an error
ndcg_score(y_train, y_pred_prob, k=5)
这是我的目标和预测概率:
# True labels of the first two samples
y_train[:2]
> array([7, 7])
# Predicted probabilities for first two observation
y_pred_prob[:2]
> array([[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.]])
我试图将 y_train
重塑为二维数组,但它不起作用。谁能告诉我如何解决这个错误?
假设您在 y_train
中有 N
个观测值。您必须将 y_train
转换为 N
行和 12
列的矩阵。
# Create an ndarray of size (N, 12) filled with zeros
y_train_matrix = np.zeros(shape=(y_pred_prob.shape[0], y_pred_prob.shape[1]))
# Write a 1 on each row's corresponding category
y_train_matrix[np.arange(y_pred_prob.shape[0]), y_train] = 1
# You now have this ndarray
y_train_matrix
array([[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.]])
现在可以计算分数了:
ndcg_score(y_train_matrix, y_pred_prob)
1.0
我正在尝试计算分类器的 ndcg 分数,但出现此错误:
ValueError: Only ('multilabel-indicator', 'continuous-multioutput', 'multiclass-multioutput') formats are supported. Got multiclass instead
这是我的代码:
# Declare classifier, fit on data and make predictions
from sklearn.ensemble import RandomForestClassifier
rnd_forest = RandomForestClassifier()
rnd_forest.fit(X_train_tr, y_train)
y_pred_prob = rnd_forest.predict_proba(X_train_tr)
# Calculate ndcg score
from sklearn.metrics import ndcg_score
# This is where I get an error
ndcg_score(y_train, y_pred_prob, k=5)
这是我的目标和预测概率:
# True labels of the first two samples
y_train[:2]
> array([7, 7])
# Predicted probabilities for first two observation
y_pred_prob[:2]
> array([[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.]])
我试图将 y_train
重塑为二维数组,但它不起作用。谁能告诉我如何解决这个错误?
假设您在 y_train
中有 N
个观测值。您必须将 y_train
转换为 N
行和 12
列的矩阵。
# Create an ndarray of size (N, 12) filled with zeros
y_train_matrix = np.zeros(shape=(y_pred_prob.shape[0], y_pred_prob.shape[1]))
# Write a 1 on each row's corresponding category
y_train_matrix[np.arange(y_pred_prob.shape[0]), y_train] = 1
# You now have this ndarray
y_train_matrix
array([[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.]])
现在可以计算分数了:
ndcg_score(y_train_matrix, y_pred_prob)
1.0