如何获得日志损失?
How to get the log loss?
我正在使用 Leaf Classification 数据集,我正在努力计算我的模型在测试后的对数损失。从此处的指标 class 导入它后,我这样做:
# fitting the knn with train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
# Optimisation via gridSearch
knn=KNeighborsClassifier()
params={'n_neighbors': range(1,40), 'weights':['uniform', 'distance'], 'metric':['minkowski','euclidean'],'algorithm': ['auto','ball_tree','kd_tree', 'brute']}
k_grd=GridSearchCV(estimator=knn,param_grid=params,cv=5)
k_grd.fit(X_train,y_train)
# testing
yk_grd=k_grd.predict(X_test)
# calculating the logloss
print (log_loss(y_test, yk_grd))
但是,我的最后一行导致了以下错误:
y_true and y_pred contain different number of classes 93, 2. Please provide the true labels explicitly through the labels argument. Classes found in y_true.
但是当我运行以下内容时:
X_train.shape, X_test.shape, y_train.shape, y_test.shape, yk_grd.shape
# results
((742, 192), (248, 192), (742,), (248,), (248,))
我到底错过了什么?
来自sklearn.metrics.log_loss
documentantion:
y_pred : array-like of float, shape = (n_samples, n_classes) or
(n_samples,)
Predicted probabilities, as returned by a classifier’s predict_proba method.
然后,得到log loss
:
yk_grd_probs = k_grd.predict_proba(X_test)
print(log_loss(y_test, yk_grd_probs))
如果仍然出现错误,则表示 y_test
中缺少特定的 class。
使用:
print(log_loss(y_test, yk_grd_probs, labels=all_classes))
其中 all_classes
是一个包含数据集中所有 class 的列表。
我正在使用 Leaf Classification 数据集,我正在努力计算我的模型在测试后的对数损失。从此处的指标 class 导入它后,我这样做:
# fitting the knn with train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
# Optimisation via gridSearch
knn=KNeighborsClassifier()
params={'n_neighbors': range(1,40), 'weights':['uniform', 'distance'], 'metric':['minkowski','euclidean'],'algorithm': ['auto','ball_tree','kd_tree', 'brute']}
k_grd=GridSearchCV(estimator=knn,param_grid=params,cv=5)
k_grd.fit(X_train,y_train)
# testing
yk_grd=k_grd.predict(X_test)
# calculating the logloss
print (log_loss(y_test, yk_grd))
但是,我的最后一行导致了以下错误:
y_true and y_pred contain different number of classes 93, 2. Please provide the true labels explicitly through the labels argument. Classes found in y_true.
但是当我运行以下内容时:
X_train.shape, X_test.shape, y_train.shape, y_test.shape, yk_grd.shape
# results
((742, 192), (248, 192), (742,), (248,), (248,))
我到底错过了什么?
来自sklearn.metrics.log_loss
documentantion:
y_pred : array-like of float, shape = (n_samples, n_classes) or (n_samples,)
Predicted probabilities, as returned by a classifier’s predict_proba method.
然后,得到log loss
:
yk_grd_probs = k_grd.predict_proba(X_test)
print(log_loss(y_test, yk_grd_probs))
如果仍然出现错误,则表示 y_test
中缺少特定的 class。
使用:
print(log_loss(y_test, yk_grd_probs, labels=all_classes))
其中 all_classes
是一个包含数据集中所有 class 的列表。