使用 GMM 分类器每个 运行 都有不同的结果
Having different results every run with GMM Classifier
我目前正在做一个语音识别和机器学习相关的项目。
我现在有两个 classes,我为每个 class 创建两个 GMM classifier,标签 'happy' 和 'sad'
我想用 MFCC 向量训练 GMM classifier。
我为每个标签使用了两个 GMM classifier。 (以前是每个文件的 GMM):
但每次我 运行 脚本都会得到不同的结果。
使用相同的测试和训练样本可能是什么原因?
在下面的输出中,请注意我有 10 个测试样本和
每行对应有序测试样本的结果
代码:
classifiers = {'happy':[],'sad':[]}
probability = {'happy':0,'sad':0}
def createGMMClassifiers():
for name, data in training.iteritems():
#For every class: In our case it is two, happy and sad
classifier = mixture.GMM(n_components = n_classes,n_iter=50)
#two classifiers.
for mfcc in data:
classifier.fit(mfcc)
addClassifier(name, classifier)
for testData in testing['happy']:
classify(testData)
def addClassifier(name,classifier):
classifiers[name]=classifier
def classify(testMFCC):
for name, classifier in classifiers.iteritems():
prediction = classifier.predict_proba(testMFCC)
for f, s in prediction:
probability[name]+=f
print 'happy ',probability['happy'],'sad ',probability['sad']
示例输出 1:
happy 154.300420496 sad 152.808941585
happy
happy 321.17737915 sad 318.621788517
happy
happy 465.294473363 sad 461.609246112
happy
happy 647.771003768 sad 640.451097035
happy
happy 792.420461416 sad 778.709674995
happy
happy 976.09526992 sad 961.337361541
happy
happy 1137.83592093 sad 1121.34722203
happy
happy 1297.14692405 sad 1278.51011583
happy
happy 1447.26926553 sad 1425.74595666
happy
happy 1593.00403707 sad 1569.85670672
happy
示例输出 2:
happy 51.699579504 sad 152.808941585
sad
happy 81.8226208497 sad 318.621788517
sad
happy 134.705526637 sad 461.609246112
sad
happy 167.228996232 sad 640.451097035
sad
happy 219.579538584 sad 778.709674995
sad
happy 248.90473008 sad 961.337361541
sad
happy 301.164079068 sad 1121.34722203
sad
happy 334.853075952 sad 1278.51011583
sad
happy 378.730734469 sad 1425.74595666
sad
happy 443.995962929 sad 1569.85670672
sad
But every time I run the script I am having different results. What might be the cause for that with same test and train samples?
scikit-learn 使用随机初始化器。如果你想要可重现的结果,你可以设置 random_state argument
random_state: RandomState or an int seed (None by default)
for name, data in training.iteritems():
这是不正确的,因为您仅在最后一个样本上进行训练。在 运行 适合之前,您需要将每个标签的特征连接到一个数组中。您可以为此使用 np.concatenate
。
我目前正在做一个语音识别和机器学习相关的项目。 我现在有两个 classes,我为每个 class 创建两个 GMM classifier,标签 'happy' 和 'sad'
我想用 MFCC 向量训练 GMM classifier。
我为每个标签使用了两个 GMM classifier。 (以前是每个文件的 GMM):
但每次我 运行 脚本都会得到不同的结果。 使用相同的测试和训练样本可能是什么原因?
在下面的输出中,请注意我有 10 个测试样本和 每行对应有序测试样本的结果
代码:
classifiers = {'happy':[],'sad':[]}
probability = {'happy':0,'sad':0}
def createGMMClassifiers():
for name, data in training.iteritems():
#For every class: In our case it is two, happy and sad
classifier = mixture.GMM(n_components = n_classes,n_iter=50)
#two classifiers.
for mfcc in data:
classifier.fit(mfcc)
addClassifier(name, classifier)
for testData in testing['happy']:
classify(testData)
def addClassifier(name,classifier):
classifiers[name]=classifier
def classify(testMFCC):
for name, classifier in classifiers.iteritems():
prediction = classifier.predict_proba(testMFCC)
for f, s in prediction:
probability[name]+=f
print 'happy ',probability['happy'],'sad ',probability['sad']
示例输出 1:
happy 154.300420496 sad 152.808941585
happy
happy 321.17737915 sad 318.621788517
happy
happy 465.294473363 sad 461.609246112
happy
happy 647.771003768 sad 640.451097035
happy
happy 792.420461416 sad 778.709674995
happy
happy 976.09526992 sad 961.337361541
happy
happy 1137.83592093 sad 1121.34722203
happy
happy 1297.14692405 sad 1278.51011583
happy
happy 1447.26926553 sad 1425.74595666
happy
happy 1593.00403707 sad 1569.85670672
happy
示例输出 2:
happy 51.699579504 sad 152.808941585
sad
happy 81.8226208497 sad 318.621788517
sad
happy 134.705526637 sad 461.609246112
sad
happy 167.228996232 sad 640.451097035
sad
happy 219.579538584 sad 778.709674995
sad
happy 248.90473008 sad 961.337361541
sad
happy 301.164079068 sad 1121.34722203
sad
happy 334.853075952 sad 1278.51011583
sad
happy 378.730734469 sad 1425.74595666
sad
happy 443.995962929 sad 1569.85670672
sad
But every time I run the script I am having different results. What might be the cause for that with same test and train samples?
scikit-learn 使用随机初始化器。如果你想要可重现的结果,你可以设置 random_state argument
random_state: RandomState or an int seed (None by default)
for name, data in training.iteritems():
这是不正确的,因为您仅在最后一个样本上进行训练。在 运行 适合之前,您需要将每个标签的特征连接到一个数组中。您可以为此使用 np.concatenate
。