SVC python 输出显示使用的每个 C 或伽玛值相同的“1”

SVC python output showing the same value of "1" for every C or gamma used

这是代码:

import numpy as np
from sklearn import svm
numere=np.fromfile("sat.trn",dtype=int,count=-1,sep=" ")
numereTest=np.fromfile("sat.tst",dtype=int,count=-1,sep=" ")
numere=numere.reshape(int(len(numere)/37),37)
numereTest=numereTest.reshape(int(len(numereTest)/37),37)
etichete=numere[0:int(len(numere)),36]
eticheteTest=numereTest[0:int(len(numereTest)),36]
numere=np.delete(numere,36,1)
numereTest=np.delete(numereTest,36,1)
clf=svm.SVC(kernel='rbf',C=1,gamma=1)
clf.fit(numere,etichete)
predictie=clf.predict(numereTest)

我从一个包含所有数据的文件中获取数据,然后用它们制作了 2 np.arrays,但我所做的一切输出都是 1。

numere(:10)-->array([[ 92, 115, 120, 94, 84, 102, 106, 79, 84, 102, 102, 83, 101, 126, 133, 103, 92, 112, 118, 85, 84, 103, 104, 81, 102, 126, 134, 104, 88, 121, 128, 100, 84, 107, 113, 87], [ 84, 102, 106, 79, 84, 102, 102, 83, 80, 102, 102, 79, 92, 112, 118, 85, 84, 103, 104, 81, 84, 99, 104, 78, 88, 121, 128, 100, 84, 107, 113, 87, 84, 99, 104, 79], [ 84, 102, 102, 83, 80, 102, 102, 79, 84, 94, 102, 79, 84, 103, 104, 81, 84, 99, 104, 78, 84, 99, 104, 81, 84, 107, 113, 87, 84, 99, 104, 79, 84, 99, 104, 79], [ 80, 102, 102, 79, 84, 94, 102, 79, 80, 94, 98, 76, 84, 99, 104, 78, 84, 99, 104, 81, 76, 99, 104, 81, 84, 99, 104, 79, 84, 99, 104, 79, 84, 103, 104, 79], [ 84, 94, 102, 79, 80, 94, 98, 76, 80, 102, 102, 79, 84, 99, 104, 81, 76, 99, 104, 81, 76, 99, 108, 85, 84, 99, 104, 79, 84, 103, 104, 79, 79, 107, 109, 87], [ 80, 94, 98, 76, 80, 102, 102, 79, 76, 102, 102, 79, 76, 99, 104, 81, 76, 99, 108, 85, 76, 103, 118, 88, 84, 103, 104, 79, 79, 107, 109, 87, 79, 107, 109, 87], [ 76, 102, 106, 83, 76, 102, 106, 87, 80, 98, 106, 79, 80, 107, 118, 88, 80, 112, 118, 88, 80, 107, 113, 85, 79, 107, 113, 87, 79, 103, 104, 83, 79, 103, 104, 79], [ 76, 102, 106, 87, 80, 98, 106, 79, 76, 94, 102, 76, 80, 112, 118, 88, 80, 107, 113, 85, 80, 95, 100, 78, 79, 103, 104, 83, 79, 103, 104, 79, 79, 95, 100, 79], [ 76, 89, 98, 76, 76, 94, 98, 76, 76, 98, 102, 72, 80, 95, 104, 74, 76, 91, 104, 74, 76, 95, 100, 78, 75, 91, 96, 75, 75, 91, 96, 71, 79, 87, 93, 71], [ 76, 94, 98, 76, 76, 98, 102, 72, 76, 94, 90, 76, 76, 91, 104, 74, 76, 95, 100, 78, 76, 91, 100, 74, 75, 91, 96、71、79、87、93、71、79、87、93、67]])

好的,您得到的最可能的原因是:

首先你不对数据使用缩放,尝试使用standard scaler

scaler = StandardScaler()
scaler.fit(numere)
numere = scaler.transform(numere)
numereTest = scaler.transform(numereTest)

其次你不是在调参数,你需要select最合适的参数,我强烈推荐使用grid search. You can find an example how to use it here。网格搜索有利于参数调整,但注意不要在此数据集中使用交叉验证,这是其创建者的建议:) Gamma 和 C 可以获得从非常低的十进制数到非常高的数字的宽值,您无法测试手动正确。

编辑:您不应使用 CV,因此这是进行网格搜索的更好方法

grid = { #edit ´this with more values
    'gamma': [0.001, 0.1, 10, 100, 1000, ],
    'C': [1, 10, 100]
}

for g in ParameterGrid(grid):
    clf.set_params(**g)
    clf.fit(numere, etichete)
    # save if best
    score = clf.score(numereTest, eticheteTest)
    if score > best_score:
        best_score = score
        best_grid = g

print ("best score:", best_score) 
print ("Grid:", best_grid)