CGImage 平均性能缓慢
Slow performance of CGImage averaging
我正在尝试从多张图片的平均值中创建一张图片。我这样做的方法是循环遍历 2 张照片的像素值,将它们加在一起并除以二。简单的数学。然而,虽然这是有效的,但它非常慢(在最大规格的 MacBook Pro 15" 2016 上平均拍摄 2x 10MP 照片大约需要 23 秒,相比之下使用 Apples CIFilter API 进行类似算法的时间要少得多)。我目前使用的代码是这个,基于另一个 Whosebug 问题 :
static func averageImages(primary: CGImage, secondary: CGImage) -> CGImage? {
guard (primary.width == secondary.width && primary.height == secondary.height) else {
return nil
}
let colorSpace = CGColorSpaceCreateDeviceRGB()
let width = primary.width
let height = primary.height
let bytesPerPixel = 4
let bitsPerComponent = 8
let bytesPerRow = bytesPerPixel * width
let bitmapInfo = RGBA32.bitmapInfo
guard let context = CGContext(data: nil, width: width, height: height, bitsPerComponent: bitsPerComponent, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo) else {
print("unable to create context")
return nil
}
guard let context2 = CGContext(data: nil, width: width, height: height, bitsPerComponent: bitsPerComponent, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo) else {
print("unable to create context 2")
return nil
}
context.draw(primary, in: CGRect(x: 0, y: 0, width: width, height: height))
context2.draw(secondary, in: CGRect(x: 0, y: 0, width: width, height: height))
guard let buffer = context.data else {
print("Unable to get context data")
return nil
}
guard let buffer2 = context2.data else {
print("Unable to get context 2 data")
return nil
}
let pixelBuffer = buffer.bindMemory(to: RGBA32.self, capacity: width * height)
let pixelBuffer2 = buffer2.bindMemory(to: RGBA32.self, capacity: width * height)
for row in 0 ..< Int(height) {
if row % 10 == 0 {
print("Row: \(row)")
}
for column in 0 ..< Int(width) {
let offset = row * width + column
let picture1 = pixelBuffer[offset]
let picture2 = pixelBuffer2[offset]
let minR = min(255,(UInt32(picture1.redComponent)+UInt32(picture2.redComponent))/2)
let minG = min(255,(UInt32(picture1.greenComponent)+UInt32(picture2.greenComponent))/2)
let minB = min(255,(UInt32(picture1.blueComponent)+UInt32(picture2.blueComponent))/2)
let minA = min(255,(UInt32(picture1.alphaComponent)+UInt32(picture2.alphaComponent))/2)
pixelBuffer[offset] = RGBA32(red: UInt8(minR), green: UInt8(minG), blue: UInt8(minB), alpha: UInt8(minA))
}
}
let outputImage = context.makeImage()
return outputImage
}
struct RGBA32: Equatable {
//private var color: UInt32
var color: UInt32
var redComponent: UInt8 {
return UInt8((color >> 24) & 255)
}
var greenComponent: UInt8 {
return UInt8((color >> 16) & 255)
}
var blueComponent: UInt8 {
return UInt8((color >> 8) & 255)
}
var alphaComponent: UInt8 {
return UInt8((color >> 0) & 255)
}
init(red: UInt8, green: UInt8, blue: UInt8, alpha: UInt8) {
let red = UInt32(red)
let green = UInt32(green)
let blue = UInt32(blue)
let alpha = UInt32(alpha)
color = (red << 24) | (green << 16) | (blue << 8) | (alpha << 0)
}
init(color: UInt32) {
self.color = color
}
static let red = RGBA32(red: 255, green: 0, blue: 0, alpha: 255)
static let green = RGBA32(red: 0, green: 255, blue: 0, alpha: 255)
static let blue = RGBA32(red: 0, green: 0, blue: 255, alpha: 255)
static let white = RGBA32(red: 255, green: 255, blue: 255, alpha: 255)
static let black = RGBA32(red: 0, green: 0, blue: 0, alpha: 255)
static let magenta = RGBA32(red: 255, green: 0, blue: 255, alpha: 255)
static let yellow = RGBA32(red: 255, green: 255, blue: 0, alpha: 255)
static let cyan = RGBA32(red: 0, green: 255, blue: 255, alpha: 255)
static let bitmapInfo = CGImageAlphaInfo.premultipliedLast.rawValue | CGBitmapInfo.byteOrder32Little.rawValue
static func ==(lhs: RGBA32, rhs: RGBA32) -> Bool {
return lhs.color == rhs.color
}
}
我在处理 RAW 像素值方面不是很有经验,可能还有很多优化空间。 RGBA32
的声明可能不是必需的,但我还是不确定我将如何简化代码。我试过简单地用 UInt32 替换该结构,但是,当我除以 2 时,四个通道之间的间隔变得混乱,我最终得到了错误的结果(积极的一面是,这使计算时间减少到大约 6秒)。
我试过删除 alpha 通道(只是将其硬编码为 255)并删除没有值超过 255 的安全检查。这已将计算时间减少到 19 秒。然而,这与我希望接近的 6 秒相去甚远,而且对 alpha 通道进行平均也很好。
注意:我知道 CIFilters;然而,首先使图像变暗,然后使用 CIAdditionCompositing
滤镜不起作用,因为 Apple 提供的 API 实际上使用了比直接加法更复杂的算法。有关这方面的更多详细信息,请参阅 here for my previous code on the subject and a similar question 测试证明 Apple 的 API 不是直接添加像素值。
**编辑:**感谢所有的反馈,我现在能够做出巨大的改进。到目前为止最大的区别是从调试更改为发布,这大大缩短了时间。然后,我能够为修改 RGBA 值编写更快的代码,从而消除了为此使用单独结构的需要。这将时间从 23 秒更改为大约 10 秒(加上调试以发布改进)。代码现在看起来像这样,也被重写了一点以看起来更具可读性:
static func averageImages(primary: CGImage, secondary: CGImage) -> CGImage? {
guard (primary.width == secondary.width && primary.height == secondary.height) else {
return nil
}
let colorSpace = CGColorSpaceCreateDeviceRGB()
let width = primary.width
let height = primary.height
let bytesPerPixel = 4
let bitsPerComponent = 8
let bytesPerRow = bytesPerPixel * width
let bitmapInfo = CGImageAlphaInfo.premultipliedLast.rawValue | CGBitmapInfo.byteOrder32Little.rawValue
guard let primaryContext = CGContext(data: nil, width: width, height: height, bitsPerComponent: bitsPerComponent, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo),
let secondaryContext = CGContext(data: nil, width: width, height: height, bitsPerComponent: bitsPerComponent, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo) else {
print("unable to create context")
return nil
}
primaryContext.draw(primary, in: CGRect(x: 0, y: 0, width: width, height: height))
secondaryContext.draw(secondary, in: CGRect(x: 0, y: 0, width: width, height: height))
guard let primaryBuffer = primaryContext.data, let secondaryBuffer = secondaryContext.data else {
print("Unable to get context data")
return nil
}
let primaryPixelBuffer = primaryBuffer.bindMemory(to: UInt32.self, capacity: width * height)
let secondaryPixelBuffer = secondaryBuffer.bindMemory(to: UInt32.self, capacity: width * height)
for row in 0 ..< Int(height) {
if row % 10 == 0 {
print("Row: \(row)")
}
for column in 0 ..< Int(width) {
let offset = row * width + column
let primaryPixel = primaryPixelBuffer[offset]
let secondaryPixel = secondaryPixelBuffer[offset]
let red = (((primaryPixel >> 24) & 255)/2 + ((secondaryPixel >> 24) & 255)/2) << 24
let green = (((primaryPixel >> 16) & 255)/2 + ((secondaryPixel >> 16) & 255)/2) << 16
let blue = (((primaryPixel >> 8) & 255)/2 + ((secondaryPixel >> 8) & 255)/2) << 8
let alpha = ((primaryPixel & 255)/2 + (secondaryPixel & 255)/2)
primaryPixelBuffer[offset] = red | green | blue | alpha
}
}
print("Done looping")
let outputImage = primaryContext.makeImage()
return outputImage
}
至于多线程,我将 运行 这个函数多次,因此将在函数的迭代中而不是在函数本身内实现多线程。我确实希望从中获得更大的性能提升,但它也必须与内存中同时拥有更多图像的内存分配增加相平衡。
感谢为此做出贡献的所有人。由于所有反馈都是通过评论进行的,因此我无法将其中任何一个标记为正确答案。我也不想 post 我更新的代码作为答案,因为我不是真正做出答案的人。关于如何进行的任何建议?
有几个选项:
并行化例程:
您可以使用 concurrentPerform
提高性能,将处理转移到多个内核。这是最简单的形式,您只需将外部 for
循环替换为 concurrentPerform
:
extension CGImage {
func average(with secondImage: CGImage) -> CGImage? {
guard
width == secondImage.width,
height == secondImage.height
else {
return nil
}
let colorSpace = CGColorSpaceCreateDeviceRGB()
let bytesPerPixel = 4
let bitsPerComponent = 8
let bytesPerRow = bytesPerPixel * width
let bitmapInfo = RGBA32.bitmapInfo
guard
let context1 = CGContext(data: nil, width: width, height: height, bitsPerComponent: bitsPerComponent, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo),
let context2 = CGContext(data: nil, width: width, height: height, bitsPerComponent: bitsPerComponent, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo),
let buffer1 = context1.data,
let buffer2 = context2.data
else {
return nil
}
context1.draw(self, in: CGRect(x: 0, y: 0, width: width, height: height))
context2.draw(secondImage, in: CGRect(x: 0, y: 0, width: width, height: height))
let imageBuffer1 = buffer1.bindMemory(to: UInt8.self, capacity: width * height * 4)
let imageBuffer2 = buffer2.bindMemory(to: UInt8.self, capacity: width * height * 4)
DispatchQueue.concurrentPerform(iterations: height) { row in // i.e. a parallelized version of `for row in 0 ..< height {`
var offset = row * bytesPerRow
for _ in 0 ..< bytesPerRow {
offset += 1
let byte1 = imageBuffer1[offset]
let byte2 = imageBuffer2[offset]
imageBuffer1[offset] = byte1 / 2 + byte2 / 2
}
}
return context1.makeImage()
}
}
请注意,其他一些观察结果:
因为您对每个字节都进行了相同的计算,所以您可以进一步简化它,摆脱转换、移位、掩码等。我还将重复计算移出内部循环。
因此,我使用 UInt8
类型并遍历 bytesPerRow
。
FWIW,我将其定义为 CGImage
扩展,调用方式为:
let combinedImage = image1.average(with: image2)
现在,我们正在逐行浏览像素阵列中的像素。您可以实际更改它以在 concurrentPerform
的每次迭代中处理多个像素,尽管我这样做时没有看到 material 更改。
我发现 concurrentPerform
比非并行化 for
循环快很多倍。不幸的是,嵌套的 for
循环只占整个函数整体处理时间的一小部分(例如,一旦包含构建这两个像素缓冲区的开销,整体性能仅比未优化的快 40%演绎)。在规格齐全的 MBP 2018 上,它可以在半秒内处理 10,000 × 10,000 像素的图像。
另一种选择是 Accelerate vImage 库。
这个库提供了各种各样的图像处理例程,如果您要处理大图像,它是一个很好的熟悉库。我不知道它的 alpha compositing 算法在数学上是否与“平均字节值”算法相同,但可能足以满足您的目的。它的优点是可以通过单个 API 调用减少嵌套的 for
循环。它还为更广泛类型的图像合成和处理例程打开了大门:
extension CGImage {
func averageVimage(with secondImage: CGImage) -> CGImage? {
let bitmapInfo: CGBitmapInfo = [.byteOrder32Little, CGBitmapInfo(rawValue: CGImageAlphaInfo.premultipliedLast.rawValue)]
let colorSpace = CGColorSpaceCreateDeviceRGB()
guard
width == secondImage.width,
height == secondImage.height,
let format = vImage_CGImageFormat(bitsPerComponent: 8, bitsPerPixel: 32, colorSpace: colorSpace, bitmapInfo: bitmapInfo)
else {
return nil
}
guard var sourceBuffer = try? vImage_Buffer(cgImage: self, format: format) else { return nil }
defer { sourceBuffer.free() }
guard var sourceBuffer2 = try? vImage_Buffer(cgImage: secondImage, format: format) else { return nil }
defer { sourceBuffer2.free() }
guard var destinationBuffer = try? vImage_Buffer(width: width, height: height, bitsPerPixel: 32) else { return nil }
defer { destinationBuffer.free() }
guard vImagePremultipliedConstAlphaBlend_ARGB8888(&sourceBuffer, Pixel_8(127), &sourceBuffer2, &destinationBuffer, vImage_Flags(kvImageNoFlags)) == kvImageNoError else {
return nil
}
return try? destinationBuffer.createCGImage(format: format)
}
}
无论如何,我发现这里的性能与concurrentPerform
算法相似。
对于咯咯笑和咧嘴笑,我还尝试使用 CGBitmapInfo.floatComponents
渲染图像并使用 BLAS catlas_saxpby
进行单行调用以平均两个向量。它运行良好,但不出所料,比上述基于整数的例程慢。
这有点老套,但会起作用,并且是您正在寻找的算法。使用 vImageMatrixMultiply_Planar< channel fmt >() 缩放每一层并将它们相加。该层的矩阵系数是该层的权重,如果您希望它们的权重相等,则大概是 N 层的 1/N。
由于我们对可能交错的数据使用平面函数,因此您需要将 src 和 dest 缓冲区的宽度乘以图像中的通道数。
我正在尝试从多张图片的平均值中创建一张图片。我这样做的方法是循环遍历 2 张照片的像素值,将它们加在一起并除以二。简单的数学。然而,虽然这是有效的,但它非常慢(在最大规格的 MacBook Pro 15" 2016 上平均拍摄 2x 10MP 照片大约需要 23 秒,相比之下使用 Apples CIFilter API 进行类似算法的时间要少得多)。我目前使用的代码是这个,基于另一个 Whosebug 问题
static func averageImages(primary: CGImage, secondary: CGImage) -> CGImage? {
guard (primary.width == secondary.width && primary.height == secondary.height) else {
return nil
}
let colorSpace = CGColorSpaceCreateDeviceRGB()
let width = primary.width
let height = primary.height
let bytesPerPixel = 4
let bitsPerComponent = 8
let bytesPerRow = bytesPerPixel * width
let bitmapInfo = RGBA32.bitmapInfo
guard let context = CGContext(data: nil, width: width, height: height, bitsPerComponent: bitsPerComponent, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo) else {
print("unable to create context")
return nil
}
guard let context2 = CGContext(data: nil, width: width, height: height, bitsPerComponent: bitsPerComponent, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo) else {
print("unable to create context 2")
return nil
}
context.draw(primary, in: CGRect(x: 0, y: 0, width: width, height: height))
context2.draw(secondary, in: CGRect(x: 0, y: 0, width: width, height: height))
guard let buffer = context.data else {
print("Unable to get context data")
return nil
}
guard let buffer2 = context2.data else {
print("Unable to get context 2 data")
return nil
}
let pixelBuffer = buffer.bindMemory(to: RGBA32.self, capacity: width * height)
let pixelBuffer2 = buffer2.bindMemory(to: RGBA32.self, capacity: width * height)
for row in 0 ..< Int(height) {
if row % 10 == 0 {
print("Row: \(row)")
}
for column in 0 ..< Int(width) {
let offset = row * width + column
let picture1 = pixelBuffer[offset]
let picture2 = pixelBuffer2[offset]
let minR = min(255,(UInt32(picture1.redComponent)+UInt32(picture2.redComponent))/2)
let minG = min(255,(UInt32(picture1.greenComponent)+UInt32(picture2.greenComponent))/2)
let minB = min(255,(UInt32(picture1.blueComponent)+UInt32(picture2.blueComponent))/2)
let minA = min(255,(UInt32(picture1.alphaComponent)+UInt32(picture2.alphaComponent))/2)
pixelBuffer[offset] = RGBA32(red: UInt8(minR), green: UInt8(minG), blue: UInt8(minB), alpha: UInt8(minA))
}
}
let outputImage = context.makeImage()
return outputImage
}
struct RGBA32: Equatable {
//private var color: UInt32
var color: UInt32
var redComponent: UInt8 {
return UInt8((color >> 24) & 255)
}
var greenComponent: UInt8 {
return UInt8((color >> 16) & 255)
}
var blueComponent: UInt8 {
return UInt8((color >> 8) & 255)
}
var alphaComponent: UInt8 {
return UInt8((color >> 0) & 255)
}
init(red: UInt8, green: UInt8, blue: UInt8, alpha: UInt8) {
let red = UInt32(red)
let green = UInt32(green)
let blue = UInt32(blue)
let alpha = UInt32(alpha)
color = (red << 24) | (green << 16) | (blue << 8) | (alpha << 0)
}
init(color: UInt32) {
self.color = color
}
static let red = RGBA32(red: 255, green: 0, blue: 0, alpha: 255)
static let green = RGBA32(red: 0, green: 255, blue: 0, alpha: 255)
static let blue = RGBA32(red: 0, green: 0, blue: 255, alpha: 255)
static let white = RGBA32(red: 255, green: 255, blue: 255, alpha: 255)
static let black = RGBA32(red: 0, green: 0, blue: 0, alpha: 255)
static let magenta = RGBA32(red: 255, green: 0, blue: 255, alpha: 255)
static let yellow = RGBA32(red: 255, green: 255, blue: 0, alpha: 255)
static let cyan = RGBA32(red: 0, green: 255, blue: 255, alpha: 255)
static let bitmapInfo = CGImageAlphaInfo.premultipliedLast.rawValue | CGBitmapInfo.byteOrder32Little.rawValue
static func ==(lhs: RGBA32, rhs: RGBA32) -> Bool {
return lhs.color == rhs.color
}
}
我在处理 RAW 像素值方面不是很有经验,可能还有很多优化空间。 RGBA32
的声明可能不是必需的,但我还是不确定我将如何简化代码。我试过简单地用 UInt32 替换该结构,但是,当我除以 2 时,四个通道之间的间隔变得混乱,我最终得到了错误的结果(积极的一面是,这使计算时间减少到大约 6秒)。
我试过删除 alpha 通道(只是将其硬编码为 255)并删除没有值超过 255 的安全检查。这已将计算时间减少到 19 秒。然而,这与我希望接近的 6 秒相去甚远,而且对 alpha 通道进行平均也很好。
注意:我知道 CIFilters;然而,首先使图像变暗,然后使用 CIAdditionCompositing
滤镜不起作用,因为 Apple 提供的 API 实际上使用了比直接加法更复杂的算法。有关这方面的更多详细信息,请参阅 here for my previous code on the subject and a similar question
**编辑:**感谢所有的反馈,我现在能够做出巨大的改进。到目前为止最大的区别是从调试更改为发布,这大大缩短了时间。然后,我能够为修改 RGBA 值编写更快的代码,从而消除了为此使用单独结构的需要。这将时间从 23 秒更改为大约 10 秒(加上调试以发布改进)。代码现在看起来像这样,也被重写了一点以看起来更具可读性:
static func averageImages(primary: CGImage, secondary: CGImage) -> CGImage? {
guard (primary.width == secondary.width && primary.height == secondary.height) else {
return nil
}
let colorSpace = CGColorSpaceCreateDeviceRGB()
let width = primary.width
let height = primary.height
let bytesPerPixel = 4
let bitsPerComponent = 8
let bytesPerRow = bytesPerPixel * width
let bitmapInfo = CGImageAlphaInfo.premultipliedLast.rawValue | CGBitmapInfo.byteOrder32Little.rawValue
guard let primaryContext = CGContext(data: nil, width: width, height: height, bitsPerComponent: bitsPerComponent, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo),
let secondaryContext = CGContext(data: nil, width: width, height: height, bitsPerComponent: bitsPerComponent, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo) else {
print("unable to create context")
return nil
}
primaryContext.draw(primary, in: CGRect(x: 0, y: 0, width: width, height: height))
secondaryContext.draw(secondary, in: CGRect(x: 0, y: 0, width: width, height: height))
guard let primaryBuffer = primaryContext.data, let secondaryBuffer = secondaryContext.data else {
print("Unable to get context data")
return nil
}
let primaryPixelBuffer = primaryBuffer.bindMemory(to: UInt32.self, capacity: width * height)
let secondaryPixelBuffer = secondaryBuffer.bindMemory(to: UInt32.self, capacity: width * height)
for row in 0 ..< Int(height) {
if row % 10 == 0 {
print("Row: \(row)")
}
for column in 0 ..< Int(width) {
let offset = row * width + column
let primaryPixel = primaryPixelBuffer[offset]
let secondaryPixel = secondaryPixelBuffer[offset]
let red = (((primaryPixel >> 24) & 255)/2 + ((secondaryPixel >> 24) & 255)/2) << 24
let green = (((primaryPixel >> 16) & 255)/2 + ((secondaryPixel >> 16) & 255)/2) << 16
let blue = (((primaryPixel >> 8) & 255)/2 + ((secondaryPixel >> 8) & 255)/2) << 8
let alpha = ((primaryPixel & 255)/2 + (secondaryPixel & 255)/2)
primaryPixelBuffer[offset] = red | green | blue | alpha
}
}
print("Done looping")
let outputImage = primaryContext.makeImage()
return outputImage
}
至于多线程,我将 运行 这个函数多次,因此将在函数的迭代中而不是在函数本身内实现多线程。我确实希望从中获得更大的性能提升,但它也必须与内存中同时拥有更多图像的内存分配增加相平衡。
感谢为此做出贡献的所有人。由于所有反馈都是通过评论进行的,因此我无法将其中任何一个标记为正确答案。我也不想 post 我更新的代码作为答案,因为我不是真正做出答案的人。关于如何进行的任何建议?
有几个选项:
并行化例程:
您可以使用
concurrentPerform
提高性能,将处理转移到多个内核。这是最简单的形式,您只需将外部for
循环替换为concurrentPerform
:extension CGImage { func average(with secondImage: CGImage) -> CGImage? { guard width == secondImage.width, height == secondImage.height else { return nil } let colorSpace = CGColorSpaceCreateDeviceRGB() let bytesPerPixel = 4 let bitsPerComponent = 8 let bytesPerRow = bytesPerPixel * width let bitmapInfo = RGBA32.bitmapInfo guard let context1 = CGContext(data: nil, width: width, height: height, bitsPerComponent: bitsPerComponent, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo), let context2 = CGContext(data: nil, width: width, height: height, bitsPerComponent: bitsPerComponent, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo), let buffer1 = context1.data, let buffer2 = context2.data else { return nil } context1.draw(self, in: CGRect(x: 0, y: 0, width: width, height: height)) context2.draw(secondImage, in: CGRect(x: 0, y: 0, width: width, height: height)) let imageBuffer1 = buffer1.bindMemory(to: UInt8.self, capacity: width * height * 4) let imageBuffer2 = buffer2.bindMemory(to: UInt8.self, capacity: width * height * 4) DispatchQueue.concurrentPerform(iterations: height) { row in // i.e. a parallelized version of `for row in 0 ..< height {` var offset = row * bytesPerRow for _ in 0 ..< bytesPerRow { offset += 1 let byte1 = imageBuffer1[offset] let byte2 = imageBuffer2[offset] imageBuffer1[offset] = byte1 / 2 + byte2 / 2 } } return context1.makeImage() } }
请注意,其他一些观察结果:
因为您对每个字节都进行了相同的计算,所以您可以进一步简化它,摆脱转换、移位、掩码等。我还将重复计算移出内部循环。
因此,我使用
UInt8
类型并遍历bytesPerRow
。FWIW,我将其定义为
CGImage
扩展,调用方式为:let combinedImage = image1.average(with: image2)
现在,我们正在逐行浏览像素阵列中的像素。您可以实际更改它以在
concurrentPerform
的每次迭代中处理多个像素,尽管我这样做时没有看到 material 更改。
我发现
concurrentPerform
比非并行化for
循环快很多倍。不幸的是,嵌套的for
循环只占整个函数整体处理时间的一小部分(例如,一旦包含构建这两个像素缓冲区的开销,整体性能仅比未优化的快 40%演绎)。在规格齐全的 MBP 2018 上,它可以在半秒内处理 10,000 × 10,000 像素的图像。另一种选择是 Accelerate vImage 库。
这个库提供了各种各样的图像处理例程,如果您要处理大图像,它是一个很好的熟悉库。我不知道它的 alpha compositing 算法在数学上是否与“平均字节值”算法相同,但可能足以满足您的目的。它的优点是可以通过单个 API 调用减少嵌套的
for
循环。它还为更广泛类型的图像合成和处理例程打开了大门:extension CGImage { func averageVimage(with secondImage: CGImage) -> CGImage? { let bitmapInfo: CGBitmapInfo = [.byteOrder32Little, CGBitmapInfo(rawValue: CGImageAlphaInfo.premultipliedLast.rawValue)] let colorSpace = CGColorSpaceCreateDeviceRGB() guard width == secondImage.width, height == secondImage.height, let format = vImage_CGImageFormat(bitsPerComponent: 8, bitsPerPixel: 32, colorSpace: colorSpace, bitmapInfo: bitmapInfo) else { return nil } guard var sourceBuffer = try? vImage_Buffer(cgImage: self, format: format) else { return nil } defer { sourceBuffer.free() } guard var sourceBuffer2 = try? vImage_Buffer(cgImage: secondImage, format: format) else { return nil } defer { sourceBuffer2.free() } guard var destinationBuffer = try? vImage_Buffer(width: width, height: height, bitsPerPixel: 32) else { return nil } defer { destinationBuffer.free() } guard vImagePremultipliedConstAlphaBlend_ARGB8888(&sourceBuffer, Pixel_8(127), &sourceBuffer2, &destinationBuffer, vImage_Flags(kvImageNoFlags)) == kvImageNoError else { return nil } return try? destinationBuffer.createCGImage(format: format) } }
无论如何,我发现这里的性能与
concurrentPerform
算法相似。对于咯咯笑和咧嘴笑,我还尝试使用
CGBitmapInfo.floatComponents
渲染图像并使用 BLAScatlas_saxpby
进行单行调用以平均两个向量。它运行良好,但不出所料,比上述基于整数的例程慢。
这有点老套,但会起作用,并且是您正在寻找的算法。使用 vImageMatrixMultiply_Planar< channel fmt >() 缩放每一层并将它们相加。该层的矩阵系数是该层的权重,如果您希望它们的权重相等,则大概是 N 层的 1/N。
由于我们对可能交错的数据使用平面函数,因此您需要将 src 和 dest 缓冲区的宽度乘以图像中的通道数。