ValueError: Can't handle mix of multilabel-indicator and binary
ValueError: Can't handle mix of multilabel-indicator and binary
我将 Keras 与 scikit-learn 包装器一起使用。特别是,我想使用 GridSearchCV 进行超参数优化。
这是一个多 class 问题,即目标变量只能在一组 n class 中选择一个标签。例如,目标变量可以是 'Class1'、'Class2' ... 'Classn'.
# self._arch creates my model
nn = KerasClassifier(build_fn=self._arch, verbose=0)
clf = GridSearchCV(
nn,
param_grid={ ... },
# I use f1 score macro averaged
scoring='f1_macro',
n_jobs=-1)
# self.fX is the data matrix
# self.fy_enc is the target variable encoded with one-hot format
clf.fit(self.fX.values, self.fy_enc.values)
问题是,当在交叉验证期间计算分数时,验证样本的真实标签被编码为 one-hot,而预测由于某种原因崩溃为二进制标签(当目标变量只有两个 classes)。例如,这是堆栈跟踪的最后一部分:
...........................................................................
/Users/fbrundu/.pyenv/versions/3.6.0/lib/python3.6/site-packages/sklearn/metrics/classification.py in _check_targets(y_true=array([[ 0., 1.],
[ 0., 1.],
[ 0... 0., 1.],
[ 0., 1.],
[ 0., 1.]]), y_pred=array([1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1,...0, 1, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 1, 1]))
77 if y_type == set(["binary", "multiclass"]):
78 y_type = set(["multiclass"])
79
80 if len(y_type) > 1:
81 raise ValueError("Can't handle mix of {0} and {1}"
---> 82 "".format(type_true, type_pred))
type_true = 'multilabel-indicator'
type_pred = 'binary'
83
84 # We can't have more than one value on y_type => The set is no more needed
85 y_type = y_type.pop()
86
ValueError: Can't handle mix of multilabel-indicator and binary
如何指示 Keras/sklearn 在 one-hot 编码中返回预测?
根据 Vivek 的评论,我使用了原始的(不是单热编码的)目标数组,并且我配置了(在我的 Keras 模型中,见代码)损失 sparse_categorical_crossentropy
,根据 the comments to this issue.
arch.compile(
optimizer='sgd',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
我将 Keras 与 scikit-learn 包装器一起使用。特别是,我想使用 GridSearchCV 进行超参数优化。
这是一个多 class 问题,即目标变量只能在一组 n class 中选择一个标签。例如,目标变量可以是 'Class1'、'Class2' ... 'Classn'.
# self._arch creates my model
nn = KerasClassifier(build_fn=self._arch, verbose=0)
clf = GridSearchCV(
nn,
param_grid={ ... },
# I use f1 score macro averaged
scoring='f1_macro',
n_jobs=-1)
# self.fX is the data matrix
# self.fy_enc is the target variable encoded with one-hot format
clf.fit(self.fX.values, self.fy_enc.values)
问题是,当在交叉验证期间计算分数时,验证样本的真实标签被编码为 one-hot,而预测由于某种原因崩溃为二进制标签(当目标变量只有两个 classes)。例如,这是堆栈跟踪的最后一部分:
...........................................................................
/Users/fbrundu/.pyenv/versions/3.6.0/lib/python3.6/site-packages/sklearn/metrics/classification.py in _check_targets(y_true=array([[ 0., 1.],
[ 0., 1.],
[ 0... 0., 1.],
[ 0., 1.],
[ 0., 1.]]), y_pred=array([1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1,...0, 1, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 1, 1]))
77 if y_type == set(["binary", "multiclass"]):
78 y_type = set(["multiclass"])
79
80 if len(y_type) > 1:
81 raise ValueError("Can't handle mix of {0} and {1}"
---> 82 "".format(type_true, type_pred))
type_true = 'multilabel-indicator'
type_pred = 'binary'
83
84 # We can't have more than one value on y_type => The set is no more needed
85 y_type = y_type.pop()
86
ValueError: Can't handle mix of multilabel-indicator and binary
如何指示 Keras/sklearn 在 one-hot 编码中返回预测?
根据 Vivek 的评论,我使用了原始的(不是单热编码的)目标数组,并且我配置了(在我的 Keras 模型中,见代码)损失 sparse_categorical_crossentropy
,根据 the comments to this issue.
arch.compile(
optimizer='sgd',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])