MultiOutputClassifier ValueError: The number of classes has to be greater than one
MultiOutputClassifier ValueError: The number of classes has to be greater than one
我正在使用 SVM 解决多标签 classification 任务,数据表示 X 中处理过的图像的特征,以及由二进制变量表示的 6 种自然元素(如山丘、云等)的存在(0 如果absent/1 如果存在),存在于 Y 中。这是火车和测试数据:
火车:https://s3.amazonaws.com/istreet-questions-us-east-1/418844/train.csv
测试:https://s3.amazonaws.com/istreet-questions-us-east-1/418844/test.csv
特征数:294
每个实例的标签数:6
这是我用来训练模型的代码:
import csv
import numpy as np
train = []
test = []
with open('/home/keerat/Desktop/train.csv') as trainfile:
reader = csv.reader(trainfile)
for row in reader:
train.append(row)
with open('/home/keerat/Desktop/test.csv') as testfile:
reader = csv.reader(testfile)
for row in reader:
test.append(row)
X = []
y = []
X_test = []
# split data into X and y
for i in range(len(train)):
X.append(train[i][0:294])
y.append(train[i][294:300])
for i in range(len(test)):
X_test.append(test[i][0:294])
# convert list of strings to list of num
for i in range(len(X)):
X[i] = [float(x) for x in X[i]]
for j in range(len(y)):
y[j] = [int(yy) for yy in y[i]]
for i in range(len(X_test)):
X_test[i] = [float(x) for x in X_test[i]]
X = np.array(X)
y = np.array(y)
X_test = np.array(X_test)
# define svm model for multi label classification
from sklearn.svm import SVC
from sklearn import metrics
from sklearn.multioutput import MultiOutputClassifier
svc=SVC() #Default hyperparameters
n_samples, n_features = X.shape
n_outputs = y.shape[1]
multi_target_svc = MultiOutputClassifier(svc, n_jobs=-1)
multi_target_svc.fit(X[:],y)
这是 X 和 y 的样子:
X:
[[0.826575 0.843082 0.805944 ... 0.010919 0.011375 0.015069]
[0.766867 0.669694 0.636238 ... 0.055661 0.079765 0.097522]
[0.962784 0.975387 0.96395 ... 0.195177 0.221791 0.201402]
...
[0.527828 0.588172 0.639713 ... 0.030422 0.004995 0.002626]
[0.574357 0.598345 0.63484 ... 0.039915 0.075365 0.056335]
[0.698135 0.732643 0.724918 ... 0.014463 0.04427 0.041442]]
y:
[[1 0 0 0 0 1]
[1 0 0 0 0 1]
[1 0 0 0 0 1]
...
[1 0 0 0 0 1]
[1 0 0 0 0 1]
[1 0 0 0 0 1]]
model.fit() 行抛出主标题中提到的错误。我已经检查过 numpy.unique(y)-->[0 1]
,这意味着我有超过 1 个(正好是 2 个)class 个可用。
任何人都可以深入了解这里出了什么问题吗?
如果 MultiOutputClassifier()
中的 n_jobs
参数设置为 1 而不是 -1,训练和测试将顺利进行。不知道是什么原因,修改之后sklearn所有分类器的问题都解决了。
我正在使用 SVM 解决多标签 classification 任务,数据表示 X 中处理过的图像的特征,以及由二进制变量表示的 6 种自然元素(如山丘、云等)的存在(0 如果absent/1 如果存在),存在于 Y 中。这是火车和测试数据:
火车:https://s3.amazonaws.com/istreet-questions-us-east-1/418844/train.csv
测试:https://s3.amazonaws.com/istreet-questions-us-east-1/418844/test.csv
特征数:294 每个实例的标签数:6
这是我用来训练模型的代码:
import csv
import numpy as np
train = []
test = []
with open('/home/keerat/Desktop/train.csv') as trainfile:
reader = csv.reader(trainfile)
for row in reader:
train.append(row)
with open('/home/keerat/Desktop/test.csv') as testfile:
reader = csv.reader(testfile)
for row in reader:
test.append(row)
X = []
y = []
X_test = []
# split data into X and y
for i in range(len(train)):
X.append(train[i][0:294])
y.append(train[i][294:300])
for i in range(len(test)):
X_test.append(test[i][0:294])
# convert list of strings to list of num
for i in range(len(X)):
X[i] = [float(x) for x in X[i]]
for j in range(len(y)):
y[j] = [int(yy) for yy in y[i]]
for i in range(len(X_test)):
X_test[i] = [float(x) for x in X_test[i]]
X = np.array(X)
y = np.array(y)
X_test = np.array(X_test)
# define svm model for multi label classification
from sklearn.svm import SVC
from sklearn import metrics
from sklearn.multioutput import MultiOutputClassifier
svc=SVC() #Default hyperparameters
n_samples, n_features = X.shape
n_outputs = y.shape[1]
multi_target_svc = MultiOutputClassifier(svc, n_jobs=-1)
multi_target_svc.fit(X[:],y)
这是 X 和 y 的样子:
X:
[[0.826575 0.843082 0.805944 ... 0.010919 0.011375 0.015069]
[0.766867 0.669694 0.636238 ... 0.055661 0.079765 0.097522]
[0.962784 0.975387 0.96395 ... 0.195177 0.221791 0.201402]
...
[0.527828 0.588172 0.639713 ... 0.030422 0.004995 0.002626]
[0.574357 0.598345 0.63484 ... 0.039915 0.075365 0.056335]
[0.698135 0.732643 0.724918 ... 0.014463 0.04427 0.041442]]
y:
[[1 0 0 0 0 1]
[1 0 0 0 0 1]
[1 0 0 0 0 1]
...
[1 0 0 0 0 1]
[1 0 0 0 0 1]
[1 0 0 0 0 1]]
model.fit() 行抛出主标题中提到的错误。我已经检查过 numpy.unique(y)-->[0 1]
,这意味着我有超过 1 个(正好是 2 个)class 个可用。
任何人都可以深入了解这里出了什么问题吗?
如果 MultiOutputClassifier()
中的 n_jobs
参数设置为 1 而不是 -1,训练和测试将顺利进行。不知道是什么原因,修改之后sklearn所有分类器的问题都解决了。