使用 TarsosDSP 修复 Kotlin 中的 "shaky" 音调检测

Fixing "shaky" pitch detection in Kotlin using TarsosDSP

我正在编写一个乐器调音器应用程序(现在从吉他开始)。对于音高检测,我使用的是 TarsosDSP。它确实能正确检测到音高,但它非常不稳定——例如,我会在我的吉他上敲击(正确调音的)D 弦,它正确地将它识别为 D,但片刻之后它循环通过一堆随机笔记很快。我不确定如何最好地解决这个问题。这是我负责检测音调的代码:

val dispatcher: AudioDispatcher = AudioDispatcherFactory.fromDefaultMicrophone(44100, 4096, 3072)
val pdh = PitchDetectionHandler { res, _ ->
            val pitchInHz: Float = res.pitch
            runOnUiThread { processing.closestNote(pitchInHz)}
        }
val pitchProcessor: AudioProcessor =
            PitchProcessor(PitchProcessor.PitchEstimationAlgorithm.FFT_YIN,
                44100F, 4096, pdh)
dispatcher.addAudioProcessor(pitchProcessor)

val audioThread = Thread(dispatcher, "Audio Thread")
        audioThread.start() 

然后我编写了一个函数,该函数应该检测最接近当前音高的音符。此外,我还尝试通过编写一个函数来获得“不那么不稳定”的结果,该函数应该以 hz 为单位找到最接近的音高,然后将该结果用于 closestNote 函数,认为这样我可能会得到更少不同的结果(即使它应该是一样的,我也没有发现任何区别)。这是两个函数:

...
private val allNotes = arrayOf("A", "A#", "B", "C", "C#", "D", "D#", "E", "F", "F#", "G", "G#")
private val concertPitch = 440
...
/** detects closest note in A = 440hz with with equal temperament formula:
 * pitch(i) = pitch(0) * 2^(i/12)
 * therefore formula to derive interval between two pitches:
 * i = 12 * log2 * (pitch(i)/pitch(o))
 */

   fun closestNote(pitchInHz: Float) {
        (myCallback as MainActivity).noteSize() //adjusts the font size of note
        if (pitchInHz != -1F) {
            val roundHz = closestPitch(pitchInHz)
            val i = (round(log2(roundHz / concertPitch) * 12)).toInt()
            val closestNote = allNotes[(i % 12 + 12) % 12]
            myCallback?.updateNote(closestNote) // updates note text
        }
    }
    private fun closestPitch(pitchInHz: Float): Float {
        val i = (round(log2(pitchInHz / concertPitch) * 12)).toInt()
        val closestPitch = concertPitch * 2.toDouble().pow(i.toDouble() / 12)
        return closestPitch.toFloat()
    }

关于如何获得更一致的结果有什么想法吗?谢谢!

我自己解决了:TarsosDSP 计算每个音符被演奏的概率。我将我的 closestNote 函数设置为仅在概率 > 0.91 时才更新文本(我发现该值提供“稳定性”,即在击中字符串后文本不会改变并且仍然正确识别音符而不击中字符串多次 times/too 困难,还用未插电的非空心电吉他进行了测试)