SVC python 输出显示使用的每个 C 或伽玛值相同的“1”
SVC python output showing the same value of "1" for every C or gamma used
这是代码:
import numpy as np
from sklearn import svm
numere=np.fromfile("sat.trn",dtype=int,count=-1,sep=" ")
numereTest=np.fromfile("sat.tst",dtype=int,count=-1,sep=" ")
numere=numere.reshape(int(len(numere)/37),37)
numereTest=numereTest.reshape(int(len(numereTest)/37),37)
etichete=numere[0:int(len(numere)),36]
eticheteTest=numereTest[0:int(len(numereTest)),36]
numere=np.delete(numere,36,1)
numereTest=np.delete(numereTest,36,1)
clf=svm.SVC(kernel='rbf',C=1,gamma=1)
clf.fit(numere,etichete)
predictie=clf.predict(numereTest)
我从一个包含所有数据的文件中获取数据,然后用它们制作了 2 np.arrays,但我所做的一切输出都是 1。
numere(:10)-->array([[ 92, 115, 120, 94, 84, 102, 106, 79, 84, 102, 102, 83, 101,
126, 133, 103, 92, 112, 118, 85, 84, 103, 104, 81, 102, 126,
134, 104, 88, 121, 128, 100, 84, 107, 113, 87],
[ 84, 102, 106, 79, 84, 102, 102, 83, 80, 102, 102, 79, 92,
112, 118, 85, 84, 103, 104, 81, 84, 99, 104, 78, 88, 121,
128, 100, 84, 107, 113, 87, 84, 99, 104, 79],
[ 84, 102, 102, 83, 80, 102, 102, 79, 84, 94, 102, 79, 84,
103, 104, 81, 84, 99, 104, 78, 84, 99, 104, 81, 84, 107,
113, 87, 84, 99, 104, 79, 84, 99, 104, 79],
[ 80, 102, 102, 79, 84, 94, 102, 79, 80, 94, 98, 76, 84,
99, 104, 78, 84, 99, 104, 81, 76, 99, 104, 81, 84, 99,
104, 79, 84, 99, 104, 79, 84, 103, 104, 79],
[ 84, 94, 102, 79, 80, 94, 98, 76, 80, 102, 102, 79, 84,
99, 104, 81, 76, 99, 104, 81, 76, 99, 108, 85, 84, 99,
104, 79, 84, 103, 104, 79, 79, 107, 109, 87],
[ 80, 94, 98, 76, 80, 102, 102, 79, 76, 102, 102, 79, 76,
99, 104, 81, 76, 99, 108, 85, 76, 103, 118, 88, 84, 103,
104, 79, 79, 107, 109, 87, 79, 107, 109, 87],
[ 76, 102, 106, 83, 76, 102, 106, 87, 80, 98, 106, 79, 80,
107, 118, 88, 80, 112, 118, 88, 80, 107, 113, 85, 79, 107,
113, 87, 79, 103, 104, 83, 79, 103, 104, 79],
[ 76, 102, 106, 87, 80, 98, 106, 79, 76, 94, 102, 76, 80,
112, 118, 88, 80, 107, 113, 85, 80, 95, 100, 78, 79, 103,
104, 83, 79, 103, 104, 79, 79, 95, 100, 79],
[ 76, 89, 98, 76, 76, 94, 98, 76, 76, 98, 102, 72, 80,
95, 104, 74, 76, 91, 104, 74, 76, 95, 100, 78, 75, 91,
96, 75, 75, 91, 96, 71, 79, 87, 93, 71],
[ 76, 94, 98, 76, 76, 98, 102, 72, 76, 94, 90, 76, 76,
91, 104, 74, 76, 95, 100, 78, 76, 91, 100, 74, 75, 91,
96、71、79、87、93、71、79、87、93、67]])
好的,您得到的最可能的原因是:
首先你不对数据使用缩放,尝试使用standard scaler。
scaler = StandardScaler()
scaler.fit(numere)
numere = scaler.transform(numere)
numereTest = scaler.transform(numereTest)
其次你不是在调参数,你需要select最合适的参数,我强烈推荐使用grid search. You can find an example how to use it here。网格搜索有利于参数调整,但注意不要在此数据集中使用交叉验证,这是其创建者的建议:) Gamma 和 C 可以获得从非常低的十进制数到非常高的数字的宽值,您无法测试手动正确。
编辑:您不应使用 CV,因此这是进行网格搜索的更好方法
grid = { #edit ´this with more values
'gamma': [0.001, 0.1, 10, 100, 1000, ],
'C': [1, 10, 100]
}
for g in ParameterGrid(grid):
clf.set_params(**g)
clf.fit(numere, etichete)
# save if best
score = clf.score(numereTest, eticheteTest)
if score > best_score:
best_score = score
best_grid = g
print ("best score:", best_score)
print ("Grid:", best_grid)
这是代码:
import numpy as np
from sklearn import svm
numere=np.fromfile("sat.trn",dtype=int,count=-1,sep=" ")
numereTest=np.fromfile("sat.tst",dtype=int,count=-1,sep=" ")
numere=numere.reshape(int(len(numere)/37),37)
numereTest=numereTest.reshape(int(len(numereTest)/37),37)
etichete=numere[0:int(len(numere)),36]
eticheteTest=numereTest[0:int(len(numereTest)),36]
numere=np.delete(numere,36,1)
numereTest=np.delete(numereTest,36,1)
clf=svm.SVC(kernel='rbf',C=1,gamma=1)
clf.fit(numere,etichete)
predictie=clf.predict(numereTest)
我从一个包含所有数据的文件中获取数据,然后用它们制作了 2 np.arrays,但我所做的一切输出都是 1。
numere(:10)-->array([[ 92, 115, 120, 94, 84, 102, 106, 79, 84, 102, 102, 83, 101, 126, 133, 103, 92, 112, 118, 85, 84, 103, 104, 81, 102, 126, 134, 104, 88, 121, 128, 100, 84, 107, 113, 87], [ 84, 102, 106, 79, 84, 102, 102, 83, 80, 102, 102, 79, 92, 112, 118, 85, 84, 103, 104, 81, 84, 99, 104, 78, 88, 121, 128, 100, 84, 107, 113, 87, 84, 99, 104, 79], [ 84, 102, 102, 83, 80, 102, 102, 79, 84, 94, 102, 79, 84, 103, 104, 81, 84, 99, 104, 78, 84, 99, 104, 81, 84, 107, 113, 87, 84, 99, 104, 79, 84, 99, 104, 79], [ 80, 102, 102, 79, 84, 94, 102, 79, 80, 94, 98, 76, 84, 99, 104, 78, 84, 99, 104, 81, 76, 99, 104, 81, 84, 99, 104, 79, 84, 99, 104, 79, 84, 103, 104, 79], [ 84, 94, 102, 79, 80, 94, 98, 76, 80, 102, 102, 79, 84, 99, 104, 81, 76, 99, 104, 81, 76, 99, 108, 85, 84, 99, 104, 79, 84, 103, 104, 79, 79, 107, 109, 87], [ 80, 94, 98, 76, 80, 102, 102, 79, 76, 102, 102, 79, 76, 99, 104, 81, 76, 99, 108, 85, 76, 103, 118, 88, 84, 103, 104, 79, 79, 107, 109, 87, 79, 107, 109, 87], [ 76, 102, 106, 83, 76, 102, 106, 87, 80, 98, 106, 79, 80, 107, 118, 88, 80, 112, 118, 88, 80, 107, 113, 85, 79, 107, 113, 87, 79, 103, 104, 83, 79, 103, 104, 79], [ 76, 102, 106, 87, 80, 98, 106, 79, 76, 94, 102, 76, 80, 112, 118, 88, 80, 107, 113, 85, 80, 95, 100, 78, 79, 103, 104, 83, 79, 103, 104, 79, 79, 95, 100, 79], [ 76, 89, 98, 76, 76, 94, 98, 76, 76, 98, 102, 72, 80, 95, 104, 74, 76, 91, 104, 74, 76, 95, 100, 78, 75, 91, 96, 75, 75, 91, 96, 71, 79, 87, 93, 71], [ 76, 94, 98, 76, 76, 98, 102, 72, 76, 94, 90, 76, 76, 91, 104, 74, 76, 95, 100, 78, 76, 91, 100, 74, 75, 91, 96、71、79、87、93、71、79、87、93、67]])
好的,您得到的最可能的原因是:
首先你不对数据使用缩放,尝试使用standard scaler。
scaler = StandardScaler()
scaler.fit(numere)
numere = scaler.transform(numere)
numereTest = scaler.transform(numereTest)
其次你不是在调参数,你需要select最合适的参数,我强烈推荐使用grid search. You can find an example how to use it here。网格搜索有利于参数调整,但注意不要在此数据集中使用交叉验证,这是其创建者的建议:) Gamma 和 C 可以获得从非常低的十进制数到非常高的数字的宽值,您无法测试手动正确。
编辑:您不应使用 CV,因此这是进行网格搜索的更好方法
grid = { #edit ´this with more values
'gamma': [0.001, 0.1, 10, 100, 1000, ],
'C': [1, 10, 100]
}
for g in ParameterGrid(grid):
clf.set_params(**g)
clf.fit(numere, etichete)
# save if best
score = clf.score(numereTest, eticheteTest)
if score > best_score:
best_score = score
best_grid = g
print ("best score:", best_score)
print ("Grid:", best_grid)