在不同的分类器中获得相同的准确度 - sklearn
Getting same accuracy across different classifiers - sklearn
我有 540 个图像像素的训练集和 150 个图像像素的测试集。这些值存储在不同的 csv 文件中,如下所示:
[label],[num0],[num1],...,[num399]
标签是一个字母,400个是像素值。这套是手语识别用的
代码 -
import numpy as np
import os
import csv
from sklearn import svm
from sklearn import cross_validation
from sklearn import linear_model
path = '/home/goel/skin'
X_train=[]
y_train=[]
X_test=[]
y_test=[]
ylist=[]
with open("20_20_centered_newer.csv",'r') as file:
reader = csv.reader(file,delimiter=',')
reader.next()
for row in file:
y_train.append(row[0])
if row[0] not in ylist:
ylist.append(row[0])
row=row[2:]
row=[int(x) for x in row.split(',')]
X_train.append(np.array(row))
y2list=[]
with open("20x20_test.csv",'r') as file:
reader = csv.reader(file,delimiter=',')
for row in file:
y_test.append(row[0])
if row[0] not in y2list:
y2list.append(row[0])
row=row[2:]
row=[int(x) for x in row.split(',')]
X_test.append(np.array(row))
print ylist
print y2list
#clf = linear_model.SGDClassifier().fit(X_train,y_train)
#clf = svm.SVC(kernel='linear').fit(X_train,y_train)
#clf = svm.LinearSVC().fit(X_train,y_train)
clf = linear_model.LogisticRegression().fit(X_train,y_train)
print clf.score(X_test,y_test)
显然,我在所有分类器中得到相同的分数 0.78,小数点后 12 位!!!
这里有我不知道的语义错误吗?
可能是因为我的 类 太少了。我用 10 类 重复了实验,并在 5 折交叉验证时得到了大约 2% 的差异。
我有 540 个图像像素的训练集和 150 个图像像素的测试集。这些值存储在不同的 csv 文件中,如下所示:
[label],[num0],[num1],...,[num399]
标签是一个字母,400个是像素值。这套是手语识别用的
代码 -
import numpy as np
import os
import csv
from sklearn import svm
from sklearn import cross_validation
from sklearn import linear_model
path = '/home/goel/skin'
X_train=[]
y_train=[]
X_test=[]
y_test=[]
ylist=[]
with open("20_20_centered_newer.csv",'r') as file:
reader = csv.reader(file,delimiter=',')
reader.next()
for row in file:
y_train.append(row[0])
if row[0] not in ylist:
ylist.append(row[0])
row=row[2:]
row=[int(x) for x in row.split(',')]
X_train.append(np.array(row))
y2list=[]
with open("20x20_test.csv",'r') as file:
reader = csv.reader(file,delimiter=',')
for row in file:
y_test.append(row[0])
if row[0] not in y2list:
y2list.append(row[0])
row=row[2:]
row=[int(x) for x in row.split(',')]
X_test.append(np.array(row))
print ylist
print y2list
#clf = linear_model.SGDClassifier().fit(X_train,y_train)
#clf = svm.SVC(kernel='linear').fit(X_train,y_train)
#clf = svm.LinearSVC().fit(X_train,y_train)
clf = linear_model.LogisticRegression().fit(X_train,y_train)
print clf.score(X_test,y_test)
显然,我在所有分类器中得到相同的分数 0.78,小数点后 12 位!!!
这里有我不知道的语义错误吗?
可能是因为我的 类 太少了。我用 10 类 重复了实验,并在 5 折交叉验证时得到了大约 2% 的差异。