使用AVAudioEngine实时检测音高是否可行?

Is it feasable to use AVAudioEngine to detect pitch in real time?

我正在尝试编写一个音乐应用程序,其中音高检测是其核心。我已经看到这个问题的解决方案以及 AppStore 上的应用程序。然而,它们中的大多数都已经过时了,我想这样做是 Swift。我一直在寻找 AVAudioEngine 作为执行此操作的一种方式,但我发现缺少文档或者我可能还没有足够努力地寻找。

我发现我可以像这样点击 inputNode 总线:

self.audioEngine = AVAudioEngine()
self.audioInputNode = self.audioEngine.inputNode!
self.audioInputNode.installTapOnBus(0, bufferSize:256, format: audioInputNode.outputFormatForBus(0), block: {(buffer, time) in
      self.analyzeBuffer(buffer)
})

总线每秒被点击 2-3 次,每次点击缓冲区包含超过 16000 个浮点数。这些振幅样本来自麦克风吗?

文档至少声称它是节点的输出:“缓冲区参数是从 AVAudioNode 的输出中捕获的音频缓冲区。

是否可以使用 AVAudioEngine 实时检测音高,还是我应该换一种方式?

这里有几个不同的概念。 AVAudioEngine 只是获取原始 PCM 数据的引擎,您可以直接使用 Novocaine、Core-Audio 或其他选项。

PCM 数据是来自麦克风的浮点样本。

就音高跟踪而言,有多种技术。需要注意的一件事是频率检测与音高检测不同。

FFT which is good but will not be able to detect the pitch of signals with missing fundamentals. You would need to run the signal through a low pass filter to reduce possible aliasing of frequencies higher than the Nyquist Frequency and then window it before passing it to the FFT, this is to reduce spectral leakage。 FFT 将输出一系列 bin 内的频谱内容,具有最高值的 bin 被称为信号中最强的频率。

Autocorrelation 这样可以得到更好的结果。它基本上是与自身相关的信号。

最终取决于您想要检测的内容,有一些注意事项需要考虑。诸如男声和某些乐器之类的东西可能会在未经预处理的缓冲区上通过正常的 FFT 运行 给出不正确的结果。

勾选这个PITCH DETECTION METHODS REVIEW

就 Swift 而言,它不太适合实时的、以性能为中心的系统。您可以查看 old benchmarks of Swift vs C++

the C++ FFT implementation is over 24x faster

我意识到 Hellium3 确实向我提供了音高是什么的信息,以及用 Swift 做这些事情是否是个好主意。

我的问题最初是关于窃听PCM总线是否是从麦克风获取输入信号的方式。

自从问了这个问题后,我就这么做了。使用通过窃听 PCM 总线获得的数据并分析缓冲区 windows。

它工作得很好,正是我对什么是 PCM 总线、缓冲器和采样频率缺乏了解让我首先问了这个问题。

了解这三点可以更容易地看出这是正确的。

编辑:根据需要,我将粘贴我的(已弃用的)PitchDetector 实现。

class PitchDetector {
  var samplingFrequency: Float
  var harmonicConstant: Float

  init(harmonicConstant: Float, samplingFrequency: Float) {
    self.harmonicConstant = harmonicConstant
    self.samplingFrequency = samplingFrequency
  }

  //------------------------------------------------------------------------------
  // MARK: Signal processing
  //------------------------------------------------------------------------------

  func detectPitch(_ samples: [Float]) -> Pitch? {
    let snac = self.snac(samples)
    let (lags, peaks) = self.findKeyMaxima(snac)
    let (τBest, clarity) = self.findBestPeak(lags, peaks: peaks)
    if τBest > 0 {
      let frequency = self.samplingFrequency / τBest
      if PitchManager.sharedManager.inManageableRange(frequency) {
        return Pitch(measuredFrequency: frequency, clarity: clarity)
      }
    }

    return nil
  }

  // Returns a Special Normalision of the AutoCorrelation function array for various lags with values between -1 and 1
  private func snac(_ samples: [Float]) -> [Float] {
    let τMax = Int(self.samplingFrequency / PitchManager.sharedManager.noteFrequencies.first!) + 1
    var snac = [Float](repeating: 0.0, count: samples.count)
    let acf = self.acf(samples)
    let norm = self.m(samples)
    for τ in 1 ..< τMax {
      snac[τ] = 2 * acf[τ + acf.count / 2] / norm[τ]
    }

    return snac
  }

  // Auto correlation function
  private func acf(_ x: [Float]) -> [Float] {
    let resultSize = 2 * x.count - 1
    var result = [Float](repeating: 0, count: resultSize)
    let xPad = repeatElement(Float(0.0), count: x.count - 1)
    let xPadded = xPad + x + xPad
    vDSP_conv(xPadded, 1, x, 1, &result, 1, vDSP_Length(resultSize), vDSP_Length(x.count))

    return result
  }

  private func m(_ samples: [Float]) -> [Float] {
    var sum: Float = 0.0
    for i in 0 ..< samples.count {
      sum += 2.0 * samples[i] * samples[i]
    }
    var m = [Float](repeating: 0.0, count: samples.count)
    m[0] = sum
    for i in 1 ..< samples.count {
      m[i] = m[i - 1] - samples[i - 1] * samples[i - 1] - samples[samples.count - i - 1] * samples[samples.count - i - 1]
    }
    return m
  }

  /**
   * Finds the indices of all key maximum points in data
   */
  private func findKeyMaxima(_ data: [Float]) -> (lags: [Float], peaks: [Float]) {
    var keyMaximaLags: [Float] = []
    var keyMaximaPeaks: [Float] = []
    var newPeakIncoming = false
    var currentBestPeak: Float = 0.0
    var currentBestτ = -1
    for τ in 0 ..< data.count {
      newPeakIncoming = newPeakIncoming || ((data[τ] < 0) && (data[τ + 1] > 0))
      if newPeakIncoming {
        if data[τ] > currentBestPeak {
          currentBestPeak = data[τ]
          currentBestτ = τ
        }
        let zeroCrossing = (data[τ] > 0) && (data[τ + 1] < 0)
        if zeroCrossing {
          let (τEst, peakEst) = self.approximateTruePeak(currentBestτ, data: data)
          keyMaximaLags.append(τEst)
          keyMaximaPeaks.append(peakEst)
          newPeakIncoming = false
          currentBestPeak = 0.0
          currentBestτ = -1
        }
      }
    }

    if keyMaximaLags.count <= 1 {
      let unwantedPeakOfLowPitchTone = (keyMaximaLags.count == 1 && data[Int(keyMaximaLags[0])] < data.max()!)
      if unwantedPeakOfLowPitchTone {
        keyMaximaLags.removeAll()
        keyMaximaPeaks.removeAll()
      }
      let (τEst, peakEst) = self.approximateTruePeak(data.index(of: data.max()!)!, data: data)
      keyMaximaLags.append(τEst)
      keyMaximaPeaks.append(peakEst)
    }

    return (lags: keyMaximaLags, peaks: keyMaximaPeaks)
  }

  /**
   * Approximates the true peak according to https://www.dsprelated.com/freebooks/sasp/Quadratic_Interpolation_Spectral_Peaks.html
   */
  private func approximateTruePeak(_ τ: Int, data: [Float]) -> (τEst: Float, peakEst: Float) {
    let α = data[τ - 1]
    let β = data[τ]
    let γ = data[τ + 1]
    let p = 0.5 * ((α - γ) / (α - 2.0 * β + γ))
    let peakEst = min(1.0, β - 0.25 * (α - γ) * p)
    let τEst = Float(τ) + p

    return (τEst, peakEst)
  }

  private func findBestPeak(_ lags: [Float], peaks: [Float]) -> (τBest: Float, clarity: Float) {
    let threshold: Float = self.harmonicConstant * peaks.max()!
    for i in 0 ..< peaks.count {
      if peaks[i] > threshold {
        return (τBest: lags[i], clarity: peaks[i])
      }
    }

    return (τBest: lags[0], clarity: peaks[0])
  }
}

所有功劳都归功于 Philip McLeod,他的研究用于我上面的实现。 http://www.cs.otago.ac.nz/research/publications/oucs-2008-03.pdf