我们如何提高 Librosa 中节奏检测的准确性?
How can we improve tempo detection accuracy in Librosa?
我正在使用 Librosa 的原生 beat_track
函数:
from librosa.beat import beat_track
tempo, beat_frames = beat_track(audio, sampling_rate)
歌曲的原始速度为 146 BPM
,而函数近似为 73.5 BPM
。虽然我理解73.5*2 ~ 148 BPM
,但我们如何实现以下内容:
- 知道何时缩放up/down 估计
- 通过预处理信号提高准确性
你观察到的就是所谓的"octave-error",即估计错误了2、1/2、3或1/3倍。这是全局速度估计中一个很常见的问题。在 An Experimental Comparison of Audio Tempo Induction Algorithms 中可以找到关于全局节奏估计的精彩经典介绍。文章还介绍了常用指标Acc1和Acc2。
自那篇文章发表以来,许多研究人员都在尝试解决八度音阶错误问题。 (从我非常有偏见的角度来看)最有前途的是 Böck 等人的A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network by myself (you might also want to check out this later paper, which uses a simpler NN architecure) and Multi-Task Learning of Tempo and Beat: Learning One to Improve the Other
两种方法都使用 convolutional neural networks (CNNs) to analyze the spectrograms. While a CNN could also be implemented in librosa, it currently is missing the programmatic infrastructure to easily do this. Another audio analysis framework seems to be a step ahead in this regard: Essentia. It is capable of running TensorFlow-模型。
我正在使用 Librosa 的原生 beat_track
函数:
from librosa.beat import beat_track
tempo, beat_frames = beat_track(audio, sampling_rate)
歌曲的原始速度为 146 BPM
,而函数近似为 73.5 BPM
。虽然我理解73.5*2 ~ 148 BPM
,但我们如何实现以下内容:
- 知道何时缩放up/down 估计
- 通过预处理信号提高准确性
你观察到的就是所谓的"octave-error",即估计错误了2、1/2、3或1/3倍。这是全局速度估计中一个很常见的问题。在 An Experimental Comparison of Audio Tempo Induction Algorithms 中可以找到关于全局节奏估计的精彩经典介绍。文章还介绍了常用指标Acc1和Acc2。
自那篇文章发表以来,许多研究人员都在尝试解决八度音阶错误问题。 (从我非常有偏见的角度来看)最有前途的是 Böck 等人的A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network by myself (you might also want to check out this later paper, which uses a simpler NN architecure) and Multi-Task Learning of Tempo and Beat: Learning One to Improve the Other
两种方法都使用 convolutional neural networks (CNNs) to analyze the spectrograms. While a CNN could also be implemented in librosa, it currently is missing the programmatic infrastructure to easily do this. Another audio analysis framework seems to be a step ahead in this regard: Essentia. It is capable of running TensorFlow-模型。