为什么 MFCC 提取库 return 不同的值?
Why do MFCC extraction libs return different values?
我正在使用两个不同的库提取 MFCC 特征:
- python_speech_features 库
- BOB 库
但是两者输出的结果不同,甚至形状也不一样。那是正常的吗?还是我缺少一个参数?
我的代码的相关部分如下:
import bob.ap
import numpy as np
from scipy.io.wavfile import read
from sklearn import preprocessing
from python_speech_features import mfcc, delta, logfbank
def bob_extract_features(audio, rate):
#get MFCC
rate = 8000 # rate
win_length_ms = 30 # The window length of the cepstral analysis in milliseconds
win_shift_ms = 10 # The window shift of the cepstral analysis in milliseconds
n_filters = 26 # The number of filter bands
n_ceps = 13 # The number of cepstral coefficients
f_min = 0. # The minimal frequency of the filter bank
f_max = 4000. # The maximal frequency of the filter bank
delta_win = 2 # The integer delta value used for computing the first and second order derivatives
pre_emphasis_coef = 0.97 # The coefficient used for the pre-emphasis
dct_norm = True # A factor by which the cepstral coefficients are multiplied
mel_scale = True # Tell whether cepstral features are extracted on a linear (LFCC) or Mel (MFCC) scale
c = bob.ap.Ceps(rate, win_length_ms, win_shift_ms, n_filters, n_ceps, f_min,
f_max, delta_win, pre_emphasis_coef, mel_scale, dct_norm)
c.with_delta = False
c.with_delta_delta = False
c.with_energy = False
signal = np.cast['float'](audio) # vector should be in **float**
example_mfcc = c(signal) # mfcc + mfcc' + mfcc''
return example_mfcc
def psf_extract_features(audio, rate):
signal = np.cast['float'](audio) #vector should be in **float**
mfcc_feature = mfcc(signal, rate, winlen = 0.03, winstep = 0.01, numcep = 13,
nfilt = 26, nfft = 512,appendEnergy = False)
#mfcc_feature = preprocessing.scale(mfcc_feature)
deltas = delta(mfcc_feature, 2)
fbank_feat = logfbank(audio, rate)
combined = np.hstack((mfcc_feature, deltas))
return mfcc_feature
track = 'test-sample.wav'
rate, audio = read(track)
features1 = psf_extract_features(audio, rate)
features2 = bob_extract_features(audio, rate)
print("--------------------------------------------")
t = (features1 == features2)
print(t)
你试过比较两者有一定的公差吗?我相信这两个 MFCC 是浮点数数组,测试是否完全相等可能并不明智。尝试使用具有一定公差的 numpy.testing.assert_allclose
,然后确定公差是否足够好。
尽管如此,我想念你说的连形状都不匹配的问题,而且我没有 bob.ap 自信地对此发表评论的经验。但是,通常情况下,某些库出于窗口原因在输入数组的开头或结尾用零填充输入,如果其中一个库的处理方式不同,这可能是负责的。
However the output of the two is different and even the shapes are not the same. Is that normal?
是的,有不同的算法,每个实现选择自己的风格
or is there a parameter that I am missing?
这不仅仅是关于参数,还有算法上的差异,比如 window 形状(汉明与汉宁)、梅尔过滤器的形状、梅尔过滤器的开始、梅尔过滤器的归一化、提升、dct 风味和依此类推。
如果你想要相同的结果,只使用单个库进行提取,同步它们是非常无望的。
我正在使用两个不同的库提取 MFCC 特征:
- python_speech_features 库
- BOB 库
但是两者输出的结果不同,甚至形状也不一样。那是正常的吗?还是我缺少一个参数?
我的代码的相关部分如下:
import bob.ap
import numpy as np
from scipy.io.wavfile import read
from sklearn import preprocessing
from python_speech_features import mfcc, delta, logfbank
def bob_extract_features(audio, rate):
#get MFCC
rate = 8000 # rate
win_length_ms = 30 # The window length of the cepstral analysis in milliseconds
win_shift_ms = 10 # The window shift of the cepstral analysis in milliseconds
n_filters = 26 # The number of filter bands
n_ceps = 13 # The number of cepstral coefficients
f_min = 0. # The minimal frequency of the filter bank
f_max = 4000. # The maximal frequency of the filter bank
delta_win = 2 # The integer delta value used for computing the first and second order derivatives
pre_emphasis_coef = 0.97 # The coefficient used for the pre-emphasis
dct_norm = True # A factor by which the cepstral coefficients are multiplied
mel_scale = True # Tell whether cepstral features are extracted on a linear (LFCC) or Mel (MFCC) scale
c = bob.ap.Ceps(rate, win_length_ms, win_shift_ms, n_filters, n_ceps, f_min,
f_max, delta_win, pre_emphasis_coef, mel_scale, dct_norm)
c.with_delta = False
c.with_delta_delta = False
c.with_energy = False
signal = np.cast['float'](audio) # vector should be in **float**
example_mfcc = c(signal) # mfcc + mfcc' + mfcc''
return example_mfcc
def psf_extract_features(audio, rate):
signal = np.cast['float'](audio) #vector should be in **float**
mfcc_feature = mfcc(signal, rate, winlen = 0.03, winstep = 0.01, numcep = 13,
nfilt = 26, nfft = 512,appendEnergy = False)
#mfcc_feature = preprocessing.scale(mfcc_feature)
deltas = delta(mfcc_feature, 2)
fbank_feat = logfbank(audio, rate)
combined = np.hstack((mfcc_feature, deltas))
return mfcc_feature
track = 'test-sample.wav'
rate, audio = read(track)
features1 = psf_extract_features(audio, rate)
features2 = bob_extract_features(audio, rate)
print("--------------------------------------------")
t = (features1 == features2)
print(t)
你试过比较两者有一定的公差吗?我相信这两个 MFCC 是浮点数数组,测试是否完全相等可能并不明智。尝试使用具有一定公差的 numpy.testing.assert_allclose
,然后确定公差是否足够好。
尽管如此,我想念你说的连形状都不匹配的问题,而且我没有 bob.ap 自信地对此发表评论的经验。但是,通常情况下,某些库出于窗口原因在输入数组的开头或结尾用零填充输入,如果其中一个库的处理方式不同,这可能是负责的。
However the output of the two is different and even the shapes are not the same. Is that normal?
是的,有不同的算法,每个实现选择自己的风格
or is there a parameter that I am missing?
这不仅仅是关于参数,还有算法上的差异,比如 window 形状(汉明与汉宁)、梅尔过滤器的形状、梅尔过滤器的开始、梅尔过滤器的归一化、提升、dct 风味和依此类推。
如果你想要相同的结果,只使用单个库进行提取,同步它们是非常无望的。