支持向量机训练:sklearn SGDClassifier.partial_fit 能够增量训练 SVM 吗?
Suport Vector Machine training :Is sklearn SGDClassifier.partial_fit able to train an SVM incrementally?
我正在尝试通过 sklearn 训练一个 SVM 模型作为二元分类器应用以获得音频的理想二元掩码(IBM),在我为我的毕业论文开发的神经网络之后应用,然而,如中所示
!this graph,精度永不收敛。无论使用多少音频,平均准确率始终约为 50%,考虑到我们只有两个选择,这是随机的。
#SVM instance
from sklearn.linear_model import SGDClassifier
SVM = SGDClassifier(loss='hinge',penalty='l2',warm_start = True,shuffle=True)
#Start training
CLEAN_DATA_PATH = r"D:\clean_trainset_56spk_wav/"
NOISY_DATA_PATH = r"D:\noisy_trainset_56spk_wav/"
audio_files = os.listdir(CLEAN_DATA_PATH)
shuffle(audio_files)
count = 0
for filename in audio_files:
if count == 1000:
break
start = time.time()
count += 1
Clean, Sr = sf.read(CLEAN_DATA_PATH + filename,dtype='float32')
Noisy, Sr = sf.read(NOISY_DATA_PATH + filename,dtype='float32')
print("Áudio " + filename )
Features, ibm = Extract_Features(Clean, Sr,Noisy)
y = ibm.reshape(-1,1)
y = np.ravel(y)
Features = sc.fit_transform(Features) # Scale
SVM.partial_fit(Features,y,classes=np.unique(y))
end = time.time()
print("Files training duration: "+str(round(end-start,2))+ " seconds")
print("Done: "+str(round((contador/len(audio_files))*100,2))+"%")
据我所知,SGDClassifier.partial_fit小批量更改权重,这将允许我们使用不同的文件作为批次(因为每个音频包含数千个样本进行分类。对吗?
非常感谢!
至少您的一个问题是,在每次迭代中,样本的规模都不同,因为您使 sc
适合每个新批次。
for filename in audio_files:
...
Features = sc.fit_transform(Features)
sc
应该在循环外定义,并这样使用:
Features = sc.transform(Features)
我正在尝试通过 sklearn 训练一个 SVM 模型作为二元分类器应用以获得音频的理想二元掩码(IBM),在我为我的毕业论文开发的神经网络之后应用,然而,如中所示 !this graph,精度永不收敛。无论使用多少音频,平均准确率始终约为 50%,考虑到我们只有两个选择,这是随机的。
#SVM instance
from sklearn.linear_model import SGDClassifier
SVM = SGDClassifier(loss='hinge',penalty='l2',warm_start = True,shuffle=True)
#Start training
CLEAN_DATA_PATH = r"D:\clean_trainset_56spk_wav/"
NOISY_DATA_PATH = r"D:\noisy_trainset_56spk_wav/"
audio_files = os.listdir(CLEAN_DATA_PATH)
shuffle(audio_files)
count = 0
for filename in audio_files:
if count == 1000:
break
start = time.time()
count += 1
Clean, Sr = sf.read(CLEAN_DATA_PATH + filename,dtype='float32')
Noisy, Sr = sf.read(NOISY_DATA_PATH + filename,dtype='float32')
print("Áudio " + filename )
Features, ibm = Extract_Features(Clean, Sr,Noisy)
y = ibm.reshape(-1,1)
y = np.ravel(y)
Features = sc.fit_transform(Features) # Scale
SVM.partial_fit(Features,y,classes=np.unique(y))
end = time.time()
print("Files training duration: "+str(round(end-start,2))+ " seconds")
print("Done: "+str(round((contador/len(audio_files))*100,2))+"%")
据我所知,SGDClassifier.partial_fit小批量更改权重,这将允许我们使用不同的文件作为批次(因为每个音频包含数千个样本进行分类。对吗?
非常感谢!
至少您的一个问题是,在每次迭代中,样本的规模都不同,因为您使 sc
适合每个新批次。
for filename in audio_files:
...
Features = sc.fit_transform(Features)
sc
应该在循环外定义,并这样使用:
Features = sc.transform(Features)