sox 转换为 spectogram 参数含义
sox convert to spectogram parameters meaning
在这里,我想使用sox
将flac文件转换成spectogram png文件。当我想转换.flac
文件时,我可以使用下面的命令
sox output.flac -n spectrogram -r -o a.png
如果我想转换成N x 129像素的频谱图,我可以使用下面的命令
sox output.flac -n spectrogram -Y 200 -X 50 -m -r -o spectogram.png
但是,我不太明白 -Y 200
和 -X 50
参数的含义,即有没有办法将这些参数转换为采样频率、时间 bin(以毫秒为单位)和也如 Matlab 或 Python 中的频率仓。如果有人能在这里详细回答那就太好了,因为 chirlu/sox 中的文档没有明确说明它的含义(或者我没有明确找到它)。
官方sox manual describes the parameters in full and the source code is here spectrogram.c。
但简而言之:
−X num:
X-axis pixels/second; the default is auto-calculated to fit the given
or known audio duration to the X-axis size, or 100 otherwise. If given
in conjunction with −d, this option affects the width of the
spectrogram; otherwise, it affects the duration of the spectrogram.
num can be from 1 (low time resolution) to 5000 (high time resolution)
and need not be an integer.
和
-Y num:
Sets the target total height of the spectrogram(s). The default value is 550
pixels. Using this option (and by default), SoX will
choose a height for individual spectrogram channels that is one more
than a power of two, so the actual total height may fall short of the
given number.
对于-X 50
,水平时间分辨率为:
dt = 1000/50 = 20 ms/pixel
对于-Y 200
2的小于200的最大次方为128,假设采样率为44.1kHz,则频率分辨率为:
bin_size = 44100/128 = 344.5 Hz
在这里,我想使用sox
将flac文件转换成spectogram png文件。当我想转换.flac
文件时,我可以使用下面的命令
sox output.flac -n spectrogram -r -o a.png
如果我想转换成N x 129像素的频谱图,我可以使用下面的命令
sox output.flac -n spectrogram -Y 200 -X 50 -m -r -o spectogram.png
但是,我不太明白 -Y 200
和 -X 50
参数的含义,即有没有办法将这些参数转换为采样频率、时间 bin(以毫秒为单位)和也如 Matlab 或 Python 中的频率仓。如果有人能在这里详细回答那就太好了,因为 chirlu/sox 中的文档没有明确说明它的含义(或者我没有明确找到它)。
官方sox manual describes the parameters in full and the source code is here spectrogram.c。
但简而言之:
−X num:
X-axis pixels/second; the default is auto-calculated to fit the given or known audio duration to the X-axis size, or 100 otherwise. If given in conjunction with −d, this option affects the width of the spectrogram; otherwise, it affects the duration of the spectrogram. num can be from 1 (low time resolution) to 5000 (high time resolution) and need not be an integer.
和
-Y num:
Sets the target total height of the spectrogram(s). The default value is 550 pixels. Using this option (and by default), SoX will choose a height for individual spectrogram channels that is one more than a power of two, so the actual total height may fall short of the given number.
对于-X 50
,水平时间分辨率为:
dt = 1000/50 = 20 ms/pixel
对于-Y 200
2的小于200的最大次方为128,假设采样率为44.1kHz,则频率分辨率为:
bin_size = 44100/128 = 344.5 Hz