警告 'Empty filters detected in mel frequency basis. ' 是关于什么的?
What is the warning 'Empty filters detected in mel frequency basis. ' about?
我正在尝试使用以下代码从具有 13 个 MFCC 的音频文件中提取 MFCC 特征:
import librosa as l
x, sr = l.load('/home/user/Data/Audio/Tracks/Dev/FS_P01_dev_001.wav', sr = 8000)
n_fft = int(sr * 0.02)
hop_length = n_fft // 2
mfccs = l.feature.mfcc(x, sr=sr, n_mfcc=13, hop_length=hop_length, n_fft=n_fft)
但是它显示了这个警告。这是什么意思,我该如何摆脱它?
UserWarning: Empty filters detected in mel frequency basis. Some channels will produce empty responses. Try increasing your sampling rate (and fmax) or reducing n_mels.
warnings.warn('Empty filters detected in mel frequency basis. '
MFCC 基于 mel-spectrograms,而后者通常基于 discrete Fourier transform (DFT). The Fourier transform takes a signal from the time domain and converts it into the frequency domain. This means that N time domain samples are converted into N frequency domain values (note the symmetry—you actually only have N/2 frequency values). Just like the time domain samples are on a linear time scale, the frequency domain samples are on a linear frequency scale. In contrast, the mel-scale 不是线性的,而是(近似)对数的。
关于傅里叶变换,您需要了解以下内容。当您的信号 F_s = 8000Hz 且 window 长度为 N:
- 不同频率bin的个数为:SL = N/2
- 您可以编码的最高频率是:F_max = F_s/2 (Nyquist-Shannon)
- 频率分辨率为:Δf = F_max/SL
现在考虑如何 MFCCs are computed (see also here):
- Take the Fourier transform of (a windowed excerpt of) a signal.
- Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows.
- Take the logs of the powers at each of the mel frequencies.
- Take the discrete cosine transform of the list of mel log powers, as if it were a signal.
- The MFCCs are the amplitudes of the resulting spectrum.
在第 2 步中,您必须将 DFT 生成的任何内容映射到不同的比例,mel-scale。如果 DFT 分辨率 Δf 太低而无法将功率值映射到(可能)更精细的 mel-scale,这将不起作用。把它想象成一幅图像:当你有一个粗糙的图像时,你不能通过将它映射到更高分辨率来提高质量。
这意味着,您必须确保您的 DFT 分辨率 Δf 对于您要使用的梅尔波段来说足够好。
为确保这一点,您必须使用更长的 window N 或更少的梅尔波段 n_mfcc
。问题的核心是,您不能同时拥有:高频分辨率 和同时 高时间分辨率。
我正在尝试使用以下代码从具有 13 个 MFCC 的音频文件中提取 MFCC 特征:
import librosa as l
x, sr = l.load('/home/user/Data/Audio/Tracks/Dev/FS_P01_dev_001.wav', sr = 8000)
n_fft = int(sr * 0.02)
hop_length = n_fft // 2
mfccs = l.feature.mfcc(x, sr=sr, n_mfcc=13, hop_length=hop_length, n_fft=n_fft)
但是它显示了这个警告。这是什么意思,我该如何摆脱它?
UserWarning: Empty filters detected in mel frequency basis. Some channels will produce empty responses. Try increasing your sampling rate (and fmax) or reducing n_mels.
warnings.warn('Empty filters detected in mel frequency basis. '
MFCC 基于 mel-spectrograms,而后者通常基于 discrete Fourier transform (DFT). The Fourier transform takes a signal from the time domain and converts it into the frequency domain. This means that N time domain samples are converted into N frequency domain values (note the symmetry—you actually only have N/2 frequency values). Just like the time domain samples are on a linear time scale, the frequency domain samples are on a linear frequency scale. In contrast, the mel-scale 不是线性的,而是(近似)对数的。
关于傅里叶变换,您需要了解以下内容。当您的信号 F_s = 8000Hz 且 window 长度为 N:
- 不同频率bin的个数为:SL = N/2
- 您可以编码的最高频率是:F_max = F_s/2 (Nyquist-Shannon)
- 频率分辨率为:Δf = F_max/SL
现在考虑如何 MFCCs are computed (see also here):
- Take the Fourier transform of (a windowed excerpt of) a signal.
- Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows.
- Take the logs of the powers at each of the mel frequencies.
- Take the discrete cosine transform of the list of mel log powers, as if it were a signal.
- The MFCCs are the amplitudes of the resulting spectrum.
在第 2 步中,您必须将 DFT 生成的任何内容映射到不同的比例,mel-scale。如果 DFT 分辨率 Δf 太低而无法将功率值映射到(可能)更精细的 mel-scale,这将不起作用。把它想象成一幅图像:当你有一个粗糙的图像时,你不能通过将它映射到更高分辨率来提高质量。 这意味着,您必须确保您的 DFT 分辨率 Δf 对于您要使用的梅尔波段来说足够好。
为确保这一点,您必须使用更长的 window N 或更少的梅尔波段 n_mfcc
。问题的核心是,您不能同时拥有:高频分辨率 和同时 高时间分辨率。