为什么 Sonic Visualizer 和我的 Python 脚本之间的频谱分析存在 dB 差异？

Question

我在实现从音频文件创建频谱的功能时遇到了问题。我问这个问题是希望有人能找到问题。

您可以下载32位浮点WAV音频文件here。

我正在编写一个脚本，该脚本使用 SciPy 和 NumPy 从音频文件创建频谱分析。在开始之前，我使用 Sonic Visualizer 分析了文件，结果如下：

现在我尝试使用我的 Python 脚本重现此结果，但得到了不同的结果：

一切看起来都正确，除了 dB 值的比例。在 100Hz 时，Sonic Visualizer 为 -40dB，而我的脚本为 -65dB。所以我假设，我的脚本将 FFT 结果转换为 dBFS 时出现问题。

如果我将 Sonic Visualizer 的曲线与我的脚本输出相匹配，很明显电平转换缺少某些因素：

我的脚本的最小版本，使用上面的 'demo.wav' 文件，如下所示：

from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
from scipy.io import wavfile as wavfile
from scipy.signal import savgol_filter

def db_fft(data, sample_rate):
    data_length = len(data)
    weighting = np.hanning(data_length)
    data = data * weighting
    values = np.fft.rfft(data)
    frequencies = np.fft.rfftfreq(data_length, d=1. / sample_rate)
    s_mag = np.abs(values) * 2 / np.sum(weighting)
    s_dbfs = 20 * np.log10(s_mag)
    return frequencies, s_dbfs

audio_file = Path('demo.wav')
frequency, data = wavfile.read(str(audio_file))
data = data[0:4096]
x_labels, s_dbfs = db_fft(data, frequency)
flat_data = savgol_filter(s_dbfs, 601, 3)
plt.style.use('seaborn-whitegrid')
plt.figure(dpi=150, figsize=(16, 9))
plt.semilogx(x_labels, s_dbfs, alpha=0.4, color='tab:blue', label='Spectrum')
plt.semilogx(x_labels, flat_data, color='tab:blue', label='Spectrum (with filter)')
plt.grid(True)
plt.title(audio_file.name)
plt.ylim([-160, 0])
plt.xlim([10, 10000])
plt.xlabel('Frequency [Hz]')
plt.ylabel('Amplitude [dB]')
plt.grid(True, which="both")
target_name = audio_file.parent / (audio_file.stem + '.png')
plt.savefig(str(target_name))

该脚本将 32 位浮点音频文件转换为 dBFS 频谱图，使用前 4096 个样本作为 window，就像 Sonic Visualizer 所做的那样。

我的脚本哪里有问题，为什么我得到的结果不一样？

Answer 1

1。不同的分贝

第一个大区别是它们使用 "power ratio" 分贝定义，来自 this Wikipedia page:

When expressing a power ratio, the number of decibels is ten times its logarithm to base 10.

我也在 v4.0.1 source code 中验证了这一点（在 svcore/base/AudioLevel.cpp，第 54 行）

double dB = 10 * log10(multiplier);

2。不同量级计算

它们看起来只是在计算量级时除以代码中 window 的大小。这导致计算更改为

s_mag = np.abs(values) * 2  / data_length

3。 "Corrected" 结果

我还没有找到导出频谱的方法，但我已经手动读取了前几个值（注意，不是 dB 值）作为

theirvalues = [
    0.00074, 
    0.000745865, 
    0.00119605, 
    0.0013713, 
    0.0011812, 
    0.000746891, 
    0.000334177,
    0.000163241,
    7.57671e-5,
    3.17983e-5,
    2.91934e-5,
    3.74938e-5
]

加上我提到的两个变化，图表比较如下：

它仍然不是完全匹配，但更接近了。我怀疑可能仍然有某种平滑处理（代码中提到了跃点，但我不太清楚他们在做什么）。

Answer 2

如您所述，您的两个结果相差一个常数因子，大约为 2。

来自Wikipedia's entry on Decibel（我强调）：

Two different scales are used when expressing a ratio in decibels, depending on the nature of the quantities: power and field (root-power). When expressing a power ratio, the number of decibels is ten times its logarithm to base 10.[2] That is, a change in power by a factor of 10 corresponds to a 10 dB change in level. When expressing field (root-power) quantities, a change in amplitude by a factor of 10 corresponds to a 20 dB change in level. The decibel scales differ by a factor of two so that the related power and field levels change by the same number of decibels with linear loads.

您使用的系数是 10。

s_dbfs = 20 * np.log10(s_mag)

如果将标量更改为 20，则会得到此图像：

这可能会也可能不会解释您的比例差异。 sonic visualizer 的源代码在sourceforge 上，所以应该很容易检查（sourceforge 不允许我设置我的跟踪策略，所以我不会自己去那里）。

为什么 Sonic Visualizer 和我的 Python 脚本之间的频谱分析存在 dB 差异？

Why is there a dB difference in the spectrum analysis between Sonic Visualizer and my Python script?

python

audio

numpy

scipy

spectrum

1。不同的分贝

2。不同量级计算

3。 "Corrected" 结果