警告 'Empty filters detected in mel frequency basis. ' 是关于什么的?

What is the warning 'Empty filters detected in mel frequency basis. ' about?

我正在尝试使用以下代码从具有 13 个 MFCC 的音频文件中提取 MFCC 特征:

import librosa as l

x, sr = l.load('/home/user/Data/Audio/Tracks/Dev/FS_P01_dev_001.wav', sr = 8000)
n_fft = int(sr * 0.02)   
hop_length = n_fft // 2  
mfccs = l.feature.mfcc(x, sr=sr, n_mfcc=13, hop_length=hop_length,  n_fft=n_fft)


UserWarning: Empty filters detected in mel frequency basis. Some channels will produce empty responses. Try increasing your sampling rate (and fmax) or reducing n_mels.
  warnings.warn('Empty filters detected in mel frequency basis. '

MFCC 基于 mel-spectrograms,而后者通常基于 discrete Fourier transform (DFT). The Fourier transform takes a signal from the time domain and converts it into the frequency domain. This means that N time domain samples are converted into N frequency domain values (note the symmetry—you actually only have N/2 frequency values). Just like the time domain samples are on a linear time scale, the frequency domain samples are on a linear frequency scale. In contrast, the mel-scale 不是线性的,而是(近似)对数的。

关于傅里叶变换,您需要了解以下内容。当您的信号 F_s = 8000Hz 且 window 长度为 N:

  • 不同频率bin的个数为:SL = N/2
  • 您可以编码的最高频率是:F_max = F_s/2 (Nyquist-Shannon)
  • 频率分辨率为:Δf = F_max/SL

现在考虑如何 MFCCs are computed (see also here):

  1. Take the Fourier transform of (a windowed excerpt of) a signal.
  2. Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows.
  3. Take the logs of the powers at each of the mel frequencies.
  4. Take the discrete cosine transform of the list of mel log powers, as if it were a signal.
  5. The MFCCs are the amplitudes of the resulting spectrum.

在第 2 步中,您必须将 DFT 生成的任何内容映射到不同的比例,mel-scale。如果 DFT 分辨率 Δf 太低而无法将功率值映射到(可能)更精细的 mel-scale,这将不起作用。把它想象成一幅图像:当你有一个粗糙的图像时,你不能通过将它映射到更高分辨率来提高质量。 这意味着,您必须确保您的 DFT 分辨率 Δf 对于您要使用的梅尔波段来说足够好。

为确保这一点,您必须使用更长的 window N 或更少的梅尔波段 n_mfcc。问题的核心是,您不能同时拥有:高频分辨率 和同时 高时间分辨率。

另见 IRCAM Intro on FFT parameters