解释 AudioBuffer.mData 以显示音频可视化
Interperating AudioBuffer.mData to display audio visualization
我正在尝试实时处理音频数据,以便可以根据麦克风输入的声音在屏幕上显示频谱 analyzer/visualization。我正在使用 AVFoundation 的 AVCaptureAudioDataOutputSampleBufferDelegate
来捕获音频数据,这会触发 delgate 函数 captureOutput
。功能如下:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
autoreleasepool {
guard captureOutput != nil,
sampleBuffer != nil,
connection != nil,
CMSampleBufferDataIsReady(sampleBuffer) else { return }
//Check this is AUDIO (and not VIDEO) being received
if (connection.audioChannels.count > 0)
{
//Determine number of frames in buffer
var numFrames = CMSampleBufferGetNumSamples(sampleBuffer)
//Get AudioBufferList
var audioBufferList = AudioBufferList(mNumberBuffers: 1, mBuffers: AudioBuffer(mNumberChannels: 0, mDataByteSize: 0, mData: nil))
var blockBuffer: CMBlockBuffer?
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer, nil, &audioBufferList, MemoryLayout<AudioBufferList>.size, nil, nil, UInt32(kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment), &blockBuffer)
let audioBuffers = UnsafeBufferPointer<AudioBuffer>(start: &audioBufferList.mBuffers, count: Int(audioBufferList.mNumberBuffers))
for audioBuffer in audioBuffers {
let data = Data(bytes: audioBuffer.mData!, count: Int(audioBuffer.mDataByteSize))
let i16array = data.withUnsafeBytes {
UnsafeBufferPointer<Int16>(start: [=11=], count: data.count/2).map(Int16.init(bigEndian:))
}
for dataItem in i16array
{
print(dataItem)
}
}
}
}
}
上面的代码按预期打印 Int16
类型的正数和负数,但需要帮助将这些原始数字转换为有意义的数据,例如我的可视化工具的功率和分贝。
我走在了正确的轨道上...感谢 RobertHarvey 对我的问题的评论 - 需要使用 Accelerate Framework 的 FFT 计算函数来实现频谱分析仪。但即使在我可以使用这些函数之前,您也需要将原始数据转换为 Float
类型的 Array
,因为许多函数需要 Float
数组。
首先,我们将原始数据加载到一个 Data
对象中:
//Read data from AudioBuffer into a variable
let data = Data(bytes: audioBuffer.mData!, count: Int(audioBuffer.mDataByteSize))
我喜欢将 Data
对象视为 "list" 1 字节大小的信息块(每个 8 位),但是如果我检查我的帧数示例和我的 Data
对象的总大小(以字节为单位),它们不匹配:
//Get number of frames in sample and total size of Data
var numFrames = CMSampleBufferGetNumSamples(sampleBuffer) //= 1024 frames in my case
var dataSize = audioBuffer.mDataByteSize //= 2048 bytes in my case
我的数据总大小(以字节为单位)是 CMSampleBuffer
中帧数的两倍。这意味着每个音频帧的长度为 2 个字节。为了有意义地读取数据,我需要将 Data
对象(1 字节块的 "list" 转换为 2 字节块数组)。 Int16
包含 16 位(或 2 个字节 - 正是我们需要的),所以让我们创建一个 Int16
:
的 Array
//Convert to Int16 array
let samples = data.withUnsafeBytes {
UnsafeBufferPointer<Int16>(start: [=12=], count: data.count / MemoryLayout<Int16>.size)
}
现在我们有 Int16
的 Array
,我们可以将其转换为 Float
的 Array
:
//Convert to Float Array
let factor = Float(Int16.max)
var floats: [Float] = Array(repeating: 0.0, count: samples.count)
for i in 0..<samples.count {
floats[i] = Float(samples[i]) / factor
}
现在我们有了 Float
数组,我们现在可以使用 Accelerate Framework 的复杂数学将原始 Float
值转换为有意义的值,如幅度、分贝等。Link文档:
我发现 Apple 的文档相当多。幸运的是,我在网上找到了一个非常好的示例,我可以根据需要重新调整它的用途,称为 TempiFFT。实现如下:
//Initiate FFT
let fft = TempiFFT(withSize: numFrames, sampleRate: 44100.0)
fft.windowType = TempiFFTWindowType.hanning
//Pass array of Floats
fft.fftForward(floats)
//I only want to display 20 bands on my analyzer
fft.calculateLinearBands(minFrequency: 0, maxFrequency: fft.nyquistFrequency, numberOfBands: 20)
//Then use a loop to iterate through the bands in your spectrum analyzer
var magnitudeArr = [Float](repeating: Float(0), count: 20)
var magnitudeDBArr = [Float](repeating: Float(0), count: 20)
for i in 0..<20
{
var magnitudeArr[i] = fft.magnitudeAtBand(i)
var magnitudeDB = TempiFFT.toDB(fft.magnitudeAtBand(i))
//..I didn't, but you could perform drawing functions here...
}
其他有用的参考资料:
我正在尝试实时处理音频数据,以便可以根据麦克风输入的声音在屏幕上显示频谱 analyzer/visualization。我正在使用 AVFoundation 的 AVCaptureAudioDataOutputSampleBufferDelegate
来捕获音频数据,这会触发 delgate 函数 captureOutput
。功能如下:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
autoreleasepool {
guard captureOutput != nil,
sampleBuffer != nil,
connection != nil,
CMSampleBufferDataIsReady(sampleBuffer) else { return }
//Check this is AUDIO (and not VIDEO) being received
if (connection.audioChannels.count > 0)
{
//Determine number of frames in buffer
var numFrames = CMSampleBufferGetNumSamples(sampleBuffer)
//Get AudioBufferList
var audioBufferList = AudioBufferList(mNumberBuffers: 1, mBuffers: AudioBuffer(mNumberChannels: 0, mDataByteSize: 0, mData: nil))
var blockBuffer: CMBlockBuffer?
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer, nil, &audioBufferList, MemoryLayout<AudioBufferList>.size, nil, nil, UInt32(kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment), &blockBuffer)
let audioBuffers = UnsafeBufferPointer<AudioBuffer>(start: &audioBufferList.mBuffers, count: Int(audioBufferList.mNumberBuffers))
for audioBuffer in audioBuffers {
let data = Data(bytes: audioBuffer.mData!, count: Int(audioBuffer.mDataByteSize))
let i16array = data.withUnsafeBytes {
UnsafeBufferPointer<Int16>(start: [=11=], count: data.count/2).map(Int16.init(bigEndian:))
}
for dataItem in i16array
{
print(dataItem)
}
}
}
}
}
上面的代码按预期打印 Int16
类型的正数和负数,但需要帮助将这些原始数字转换为有意义的数据,例如我的可视化工具的功率和分贝。
我走在了正确的轨道上...感谢 RobertHarvey 对我的问题的评论 - 需要使用 Accelerate Framework 的 FFT 计算函数来实现频谱分析仪。但即使在我可以使用这些函数之前,您也需要将原始数据转换为 Float
类型的 Array
,因为许多函数需要 Float
数组。
首先,我们将原始数据加载到一个 Data
对象中:
//Read data from AudioBuffer into a variable
let data = Data(bytes: audioBuffer.mData!, count: Int(audioBuffer.mDataByteSize))
我喜欢将 Data
对象视为 "list" 1 字节大小的信息块(每个 8 位),但是如果我检查我的帧数示例和我的 Data
对象的总大小(以字节为单位),它们不匹配:
//Get number of frames in sample and total size of Data
var numFrames = CMSampleBufferGetNumSamples(sampleBuffer) //= 1024 frames in my case
var dataSize = audioBuffer.mDataByteSize //= 2048 bytes in my case
我的数据总大小(以字节为单位)是 CMSampleBuffer
中帧数的两倍。这意味着每个音频帧的长度为 2 个字节。为了有意义地读取数据,我需要将 Data
对象(1 字节块的 "list" 转换为 2 字节块数组)。 Int16
包含 16 位(或 2 个字节 - 正是我们需要的),所以让我们创建一个 Int16
:
Array
//Convert to Int16 array
let samples = data.withUnsafeBytes {
UnsafeBufferPointer<Int16>(start: [=12=], count: data.count / MemoryLayout<Int16>.size)
}
现在我们有 Int16
的 Array
,我们可以将其转换为 Float
的 Array
:
//Convert to Float Array
let factor = Float(Int16.max)
var floats: [Float] = Array(repeating: 0.0, count: samples.count)
for i in 0..<samples.count {
floats[i] = Float(samples[i]) / factor
}
现在我们有了 Float
数组,我们现在可以使用 Accelerate Framework 的复杂数学将原始 Float
值转换为有意义的值,如幅度、分贝等。Link文档:
我发现 Apple 的文档相当多。幸运的是,我在网上找到了一个非常好的示例,我可以根据需要重新调整它的用途,称为 TempiFFT。实现如下:
//Initiate FFT
let fft = TempiFFT(withSize: numFrames, sampleRate: 44100.0)
fft.windowType = TempiFFTWindowType.hanning
//Pass array of Floats
fft.fftForward(floats)
//I only want to display 20 bands on my analyzer
fft.calculateLinearBands(minFrequency: 0, maxFrequency: fft.nyquistFrequency, numberOfBands: 20)
//Then use a loop to iterate through the bands in your spectrum analyzer
var magnitudeArr = [Float](repeating: Float(0), count: 20)
var magnitudeDBArr = [Float](repeating: Float(0), count: 20)
for i in 0..<20
{
var magnitudeArr[i] = fft.magnitudeAtBand(i)
var magnitudeDB = TempiFFT.toDB(fft.magnitudeAtBand(i))
//..I didn't, but you could perform drawing functions here...
}
其他有用的参考资料: