从 Swift 中的 CVPixelBufferRef 获取像素值
Get pixel value from CVPixelBufferRef in Swift
如何从 CVPixelBufferRef 获取 RGB(或任何其他格式)像素值?我尝试了很多方法,但都没有成功。
func captureOutput(captureOutput: AVCaptureOutput!,
didOutputSampleBuffer sampleBuffer: CMSampleBuffer!,
fromConnection connection: AVCaptureConnection!) {
let pixelBuffer: CVPixelBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(pixelBuffer, 0)
let baseAddress = CVPixelBufferGetBaseAddress(pixelBuffer)
//Get individual pixel values here
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0)
}
baseAddress
是一个不安全的可变指针,或者更准确地说是 UnsafeMutablePointer<Void>
。将指针从 Void
转换为更具体的类型后,您可以轻松访问内存:
// Convert the base address to a safe pointer of the appropriate type
let byteBuffer = UnsafeMutablePointer<UInt8>(baseAddress)
// read the data (returns value of type UInt8)
let firstByte = byteBuffer[0]
// write data
byteBuffer[3] = 90
确保使用正确的类型(8、16 或 32 位 unsigned int)。这取决于视频格式。很可能是 8 位。
缓冲区格式更新:
您可以在初始化AVCaptureVideoDataOutput
实例时指定格式。您基本上可以选择:
- BGRA:一个平面,其中蓝色、绿色、红色和 alpha 值分别存储在一个 32 位整数中
- 420YpCbCr8BiPlanarFullRange:两个平面,第一个包含每个像素的字节和 Y(亮度)值,第二个包含像素组的 Cb 和 Cr(色度)值
- 420YpCbCr8BiPlanarVideoRange:与 420YpCbCr8BiPlanarFullRange 相同,但 Y 值限制在 16 – 235 范围内(出于历史原因)
如果您对颜色值感兴趣并且速度(或者说最大帧速率)不是问题,那么请选择更简单的 BGRA 格式。否则采用一种更高效的原生视频格式。
如果你有两个平面,你必须得到所需平面的基地址(见视频格式示例):
视频格式示例
let pixelBuffer: CVPixelBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(pixelBuffer, 0)
let baseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0)
let bytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0)
let byteBuffer = UnsafeMutablePointer<UInt8>(baseAddress)
// Get luma value for pixel (43, 17)
let luma = byteBuffer[17 * bytesPerRow + 43]
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0)
BGRA 示例
let pixelBuffer: CVPixelBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(pixelBuffer, 0)
let baseAddress = CVPixelBufferGetBaseAddress(pixelBuffer)
let int32PerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
let int32Buffer = UnsafeMutablePointer<UInt32>(baseAddress)
// Get BGRA value for pixel (43, 17)
let luma = int32Buffer[17 * int32PerRow + 43]
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0)
Swift3 更新:
let pixelBuffer: CVPixelBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0));
let int32Buffer = unsafeBitCast(CVPixelBufferGetBaseAddress(pixelBuffer), to: UnsafeMutablePointer<UInt32>.self)
let int32PerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
// Get BGRA value for pixel (43, 17)
let luma = int32Buffer[17 * int32PerRow + 43]
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0)
这是一种从 BGRA 像素缓冲区获取各个 rgb 值的方法。注意:在调用此之前必须锁定您的缓冲区。
func pixelFrom(x: Int, y: Int, movieFrame: CVPixelBuffer) -> (UInt8, UInt8, UInt8) {
let baseAddress = CVPixelBufferGetBaseAddress(movieFrame)
let bytesPerRow = CVPixelBufferGetBytesPerRow(movieFrame)
let buffer = baseAddress!.assumingMemoryBound(to: UInt8.self)
let index = x*4 + y*bytesPerRow
let b = buffer[index]
let g = buffer[index+1]
let r = buffer[index+2]
return (r, g, b)
}
Swift 5
我遇到了同样的问题,最后得到了以下解决方案。我的 CVPixelBuffer
具有维度 68 x 68
,可以通过
检查
CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
print(CVPixelBufferGetWidth(pixelBuffer))
print(CVPixelBufferGetHeight(pixelBuffer))
您还必须知道每行的字节数:
print(CVPixelBufferGetBytesPerRow(pixelBuffer))
我的情况是 320。
此外,您需要知道像素缓冲区的数据类型,对我来说是 Float32
。
然后我构造了一个字节缓冲区,连续读取字节如下(记得锁定基地址,如上图):
var byteBuffer = unsafeBitCast(CVPixelBufferGetBaseAddress(pixelBuffer), to: UnsafeMutablePointer<Float32>.self)
var pixelArray: Array<Array<Float>> = Array(repeating: Array(repeating: 0, count: 68), count: 68)
for row in 0...67{
for col in 0...67{
pixelArray[row][col] = byteBuffer.pointee
byteBuffer = byteBuffer.successor()
}
byteBuffer = byteBuffer.advanced(by: 12)
}
CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
您可能想知道 byteBuffer = byteBuffer.advanced(by: 12)
部分。我们必须这样做的原因如下。
我们知道每行有 320 个字节。但是,我们的缓冲区宽度为 68,数据类型为 Float32
,例如每个值 4 个字节。这意味着我们实际上每行只有 272
个字节,然后是零填充。这个零填充可能有内存布局的原因。
因此,我们必须跳过每行的最后 48 个字节,这是由 byteBuffer = byteBuffer.advanced(by: 12)
(12*4 = 48
) 完成的。
这种方法与其他解决方案有些不同,因为我们使用指向下一个 byteBuffer
的指针。但是,我发现这更容易、更直观。
如何从 CVPixelBufferRef 获取 RGB(或任何其他格式)像素值?我尝试了很多方法,但都没有成功。
func captureOutput(captureOutput: AVCaptureOutput!,
didOutputSampleBuffer sampleBuffer: CMSampleBuffer!,
fromConnection connection: AVCaptureConnection!) {
let pixelBuffer: CVPixelBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(pixelBuffer, 0)
let baseAddress = CVPixelBufferGetBaseAddress(pixelBuffer)
//Get individual pixel values here
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0)
}
baseAddress
是一个不安全的可变指针,或者更准确地说是 UnsafeMutablePointer<Void>
。将指针从 Void
转换为更具体的类型后,您可以轻松访问内存:
// Convert the base address to a safe pointer of the appropriate type
let byteBuffer = UnsafeMutablePointer<UInt8>(baseAddress)
// read the data (returns value of type UInt8)
let firstByte = byteBuffer[0]
// write data
byteBuffer[3] = 90
确保使用正确的类型(8、16 或 32 位 unsigned int)。这取决于视频格式。很可能是 8 位。
缓冲区格式更新:
您可以在初始化AVCaptureVideoDataOutput
实例时指定格式。您基本上可以选择:
- BGRA:一个平面,其中蓝色、绿色、红色和 alpha 值分别存储在一个 32 位整数中
- 420YpCbCr8BiPlanarFullRange:两个平面,第一个包含每个像素的字节和 Y(亮度)值,第二个包含像素组的 Cb 和 Cr(色度)值
- 420YpCbCr8BiPlanarVideoRange:与 420YpCbCr8BiPlanarFullRange 相同,但 Y 值限制在 16 – 235 范围内(出于历史原因)
如果您对颜色值感兴趣并且速度(或者说最大帧速率)不是问题,那么请选择更简单的 BGRA 格式。否则采用一种更高效的原生视频格式。
如果你有两个平面,你必须得到所需平面的基地址(见视频格式示例):
视频格式示例
let pixelBuffer: CVPixelBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(pixelBuffer, 0)
let baseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0)
let bytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0)
let byteBuffer = UnsafeMutablePointer<UInt8>(baseAddress)
// Get luma value for pixel (43, 17)
let luma = byteBuffer[17 * bytesPerRow + 43]
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0)
BGRA 示例
let pixelBuffer: CVPixelBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(pixelBuffer, 0)
let baseAddress = CVPixelBufferGetBaseAddress(pixelBuffer)
let int32PerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
let int32Buffer = UnsafeMutablePointer<UInt32>(baseAddress)
// Get BGRA value for pixel (43, 17)
let luma = int32Buffer[17 * int32PerRow + 43]
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0)
Swift3 更新:
let pixelBuffer: CVPixelBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0));
let int32Buffer = unsafeBitCast(CVPixelBufferGetBaseAddress(pixelBuffer), to: UnsafeMutablePointer<UInt32>.self)
let int32PerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
// Get BGRA value for pixel (43, 17)
let luma = int32Buffer[17 * int32PerRow + 43]
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0)
这是一种从 BGRA 像素缓冲区获取各个 rgb 值的方法。注意:在调用此之前必须锁定您的缓冲区。
func pixelFrom(x: Int, y: Int, movieFrame: CVPixelBuffer) -> (UInt8, UInt8, UInt8) {
let baseAddress = CVPixelBufferGetBaseAddress(movieFrame)
let bytesPerRow = CVPixelBufferGetBytesPerRow(movieFrame)
let buffer = baseAddress!.assumingMemoryBound(to: UInt8.self)
let index = x*4 + y*bytesPerRow
let b = buffer[index]
let g = buffer[index+1]
let r = buffer[index+2]
return (r, g, b)
}
Swift 5
我遇到了同样的问题,最后得到了以下解决方案。我的 CVPixelBuffer
具有维度 68 x 68
,可以通过
CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
print(CVPixelBufferGetWidth(pixelBuffer))
print(CVPixelBufferGetHeight(pixelBuffer))
您还必须知道每行的字节数:
print(CVPixelBufferGetBytesPerRow(pixelBuffer))
我的情况是 320。
此外,您需要知道像素缓冲区的数据类型,对我来说是 Float32
。
然后我构造了一个字节缓冲区,连续读取字节如下(记得锁定基地址,如上图):
var byteBuffer = unsafeBitCast(CVPixelBufferGetBaseAddress(pixelBuffer), to: UnsafeMutablePointer<Float32>.self)
var pixelArray: Array<Array<Float>> = Array(repeating: Array(repeating: 0, count: 68), count: 68)
for row in 0...67{
for col in 0...67{
pixelArray[row][col] = byteBuffer.pointee
byteBuffer = byteBuffer.successor()
}
byteBuffer = byteBuffer.advanced(by: 12)
}
CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
您可能想知道 byteBuffer = byteBuffer.advanced(by: 12)
部分。我们必须这样做的原因如下。
我们知道每行有 320 个字节。但是,我们的缓冲区宽度为 68,数据类型为 Float32
,例如每个值 4 个字节。这意味着我们实际上每行只有 272
个字节,然后是零填充。这个零填充可能有内存布局的原因。
因此,我们必须跳过每行的最后 48 个字节,这是由 byteBuffer = byteBuffer.advanced(by: 12)
(12*4 = 48
) 完成的。
这种方法与其他解决方案有些不同,因为我们使用指向下一个 byteBuffer
的指针。但是,我发现这更容易、更直观。