sox 转换为 spectogram 参数含义

sox convert to spectogram parameters meaning

在这里,我想使用sox将flac文件转换成spectogram png文件。当我想转换.flac文件时,我可以使用下面的命令

sox output.flac -n spectrogram -r -o a.png

如果我想转换成N x 129像素的频谱图,我可以使用下面的命令

sox output.flac -n spectrogram -Y 200 -X 50 -m -r -o spectogram.png

但是,我不太明白 -Y 200-X 50 参数的含义,即有没有办法将这些参数转换为采样频率、时间 bin(以毫秒为单位)和也如 Matlab 或 Python 中的频率仓。如果有人能在这里详细回答那就太好了,因为 chirlu/sox 中的文档没有明确说明它的含义(或者我没有明确找到它)。

官方sox manual describes the parameters in full and the source code is here spectrogram.c

但简而言之:

−X num:

X-axis pixels/second; the default is auto-calculated to fit the given or known audio duration to the X-axis size, or 100 otherwise. If given in conjunction with −d, this option affects the width of the spectrogram; otherwise, it affects the duration of the spectrogram. num can be from 1 (low time resolution) to 5000 (high time resolution) and need not be an integer.

-Y num:

Sets the target total height of the spectrogram(s). The default value is 550 pixels. Using this option (and by default), SoX will choose a height for individual spectrogram channels that is one more than a power of two, so the actual total height may fall short of the given number.

对于-X 50,水平时间分辨率为:

dt = 1000/50 = 20 ms/pixel

对于-Y 2002的小于200的最大次方为128,假设采样率为44.1kHz,则频率分辨率为:

bin_size = 44100/128 = 344.5 Hz