使用 Vision 和 RealityKit 逆时针旋转并扭曲（拉伸？）视频

Question

我正在尝试学习iOS中的对象检测，然后标记检测到的对象的位置。我在项目中训练并安装了模型。我的下一步是在屏幕上显示 AR 视图。那是有效的。当我通过按钮打开我的视觉处理代码时，我最终看到屏幕上的图像旋转和扭曲（很可能只是由于倒轴而拉伸）。

我找到了一个帮助指导我的部分教程，他们似乎运行解决了这个问题，解决了它，但没有显示解决方案。我无法联系到作者。作者的评论是：one slightly tricky aspect to this was that the coordinate system returned from Vision was different than SwiftUI’s coordinate system (normalized and the y-axis was flipped), but some simple transformations did the trick.

我不知道它们是哪些简单的转换，但我怀疑它们与 simd 相关。如果有人对此有所了解，我将不胜感激解决旋转和扭曲问题。

Vision 启动后，控制台中确实出现了错误代码：

与此类似的消息：

2022-05-12 21:14:39.142550-0400 Find My Apple Remote[66143:9990936] [Assets] Resolving material name 'engine:BuiltinRenderGraphResources/AR/arInPlacePostProcessCombinedPermute7.rematerial' as an asset path -- this usage is deprecated; instead provide a valid bundle

2022-05-12 21:14:39.270684-0400 Find My Apple Remote[66143:9991089] [Session] ARSession <0x111743970>: ARSessionDelegate is retaining 11 ARFrames. This can lead to future camera frames being dropped.

2022-05-12 21:14:40.121810-0400 Find My Apple Remote[66143:9991117] [CAMetalLayer nextDrawable] returning nil because allocation failed.

我最关心的是最后一个。

到目前为止，我的代码是：

struct ContentView : View {
    
    @State private var isDetecting = false
    @State private var success = false
    var body: some View {
        VStack {
            RealityKitView(isDetecting: $isDetecting, success: $success)
                .overlay(alignment: .top) {
                    Image(systemName: (success ? "checkmark.circle" : "slash.circle"))
                        .foregroundColor(success ? .green : .red)
                }
            Button {
                isDetecting.toggle()
            } label: {
                Text(isDetecting ? "Stop Detecting" : "Start Detecting")
                    .frame(width: 150, height: 50)
                    .background(
                        Capsule()
                            .fill(isDetecting ? Color.red.opacity(0.5) : Color.green.opacity(0.5))
                    )
            }
        }
    }
}

import SwiftUI
import ARKit
import RealityKit
import Vision

struct RealityKitView: UIViewRepresentable {
    
    let arView = ARView()
    let scale = SIMD3<Float>(repeating: 0.1)
    let model: VNCoreMLModel? = RealityKitView.returnMLModel()
    
    @Binding var isDetecting: Bool
    @Binding var success: Bool
    
    @State var boundingBox: CGRect?
    
    func makeUIView(context: Context) -> some UIView {
        
        // Start AR Session
        let session = configureSession()
        
        // Handle ARSession events via delegate
        session.delegate = context.coordinator
        
        return arView
    }
    
    func configureSession() -> ARSession {
        let session = arView.session
        let config = ARWorldTrackingConfiguration()
        config.planeDetection = [.horizontal, .vertical]
        config.environmentTexturing = .automatic
        session.run(config)
        return session
    }
    
    static func returnMLModel() -> VNCoreMLModel? {
        do {
            let detector = try AppleRemoteDetector()
            let model = try VNCoreMLModel(for: detector.model)
            return model
        } catch {
            print("RealityKitView:returnMLModel failed with error: \(error)")
        }
        return nil
    }
    
    func updateUIView(_ uiView: UIViewType, context: Context) {}
    
    func makeCoordinator() -> Coordinator {
        Coordinator(self)
    }
    
    class Coordinator: NSObject, ARSessionDelegate {
        var parent: RealityKitView
        
        init(_ parent: RealityKitView) {
            self.parent = parent
        }
        
        func session(_ session: ARSession, didUpdate frame: ARFrame) {
            // Start vision processing 
            if parent.isDetecting {
                guard let model = parent.model else {
                    return
                }
                // I suspect the problem is here where the image is captured in a buffer, and then
                // turned in to an input for the CoreML model.
                let pixelBuffer = frame.capturedImage
                let input = AppleRemoteDetectorInput(image: pixelBuffer)
                
                do {
                    let request = VNCoreMLRequest(model: model) { (request, error) in
                        guard
                            let results = request.results,
                            !results.isEmpty,
                            let recognizedObjectObservation = results as? [VNRecognizedObjectObservation],
                            let first = recognizedObjectObservation.first
                        else {
                            self.parent.boundingBox = nil
                            self.parent.success = false
                            return
                        }
                        self.parent.success = true
                        print("\(first.boundingBox)")
                        self.parent.boundingBox = first.boundingBox
                    }
                    
                    model.featureProvider = input
                    
                    let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: CGImagePropertyOrientation.right, options: [:])
                    try handler.perform([request])
                } catch {
                    print(error)
                }
            }
        }
    }
}

Answer 1

经过几天的尝试，通过研究和更多研究，我发现提供了解决方案。请注意，这两个答案都是有效的，只是取决于您的应用程序的结构。

问题的症结在于导致 RealityKitView 的状态变化导致 ARView 变为 re-instantiated。然而，这一次，它被实例化为 0 的大小，这就是导致错误 [CAMetalLayer nextDrawable] returning nil because allocation failed 的原因，因为这导致它 return nil。但是，用这样的大小初始化它：

let arView = ARView(frame: .init(x: 1, y: 1, width: 1, height: 1), cameraMode: .ar, automaticallyConfigureSession: false)

解决了这个问题。

为了将来尝试这样做的人，这里是当前工作 UIViewRepresentable:

import SwiftUI
import ARKit
import RealityKit
import Vision

struct RealityKitView: UIViewRepresentable {
    
    let arView = ARView(frame: .init(x: 1, y: 1, width: 1, height: 1), cameraMode: .ar, automaticallyConfigureSession: false)
    // Making this implicity unwrapped. If this fails, the app should crash anyway...
    let model: VNCoreMLModel! = RealityKitView.returnMLModel()
    
    @Binding var isDetecting: Bool // This turns Vision on and off
    @Binding var success: Bool // This is the state of Vision's finding the object
    @Binding var message: String // This allows different messages to be communicated to the user
    
    @State var boundingBox: CGRect?
    
    func makeUIView(context: Context) -> some UIView {
        
        // Start AR Session
        let session = configureSession()
        
        // Add coaching overlay
        addCoachingOverlay(session: session)
        
        // Handle ARSession events via delegate
        session.delegate = context.coordinator
        
        return arView
    }
    
    func addCoachingOverlay(session: ARSession) {
        let coachingOverlay = ARCoachingOverlayView()
        coachingOverlay.autoresizingMask = [.flexibleWidth, .flexibleHeight]
        coachingOverlay.session = session
        coachingOverlay.goal = .horizontalPlane
        arView.addSubview(coachingOverlay)
    }
    
    func configureSession() -> ARSession {
        let session = arView.session
        let config = ARWorldTrackingConfiguration()
        config.planeDetection = [.horizontal, .vertical]
        config.environmentTexturing = .automatic
        session.run(config)
        return session
    }
    
    static func returnMLModel() -> VNCoreMLModel? {
        do {
            let config = MLModelConfiguration()
            config.computeUnits = .all
            let detector = try AppleRemoteDetector()
            let model = try VNCoreMLModel(for: detector.model)
            return model
        } catch {
            print("RealityKitView:returnMLModel failed with error: \(error)")
        }
        return nil
    }
    
    func updateUIView(_ uiView: UIViewType, context: Context) {
        
    }
    
    func makeCoordinator() -> Coordinator {
        Coordinator(self)
    }
    
    class Coordinator: NSObject, ARSessionDelegate {
        var parent: RealityKitView
        
        init(_ parent: RealityKitView) {
            self.parent = parent
        }
        
        func session(_ session: ARSession, didUpdate frame: ARFrame) {
            if parent.isDetecting {
                // Do not enqueue other buffers for processing while another Vision task is still running.
                // The camera stream has only a finite amount of buffers available; holding too many buffers for analysis would starve the camera.
                guard currentBuffer == nil, case .normal = frame.camera.trackingState else {
                    return
                }
                
                // Retain the image buffer for Vision processing.
                self.currentBuffer = frame.capturedImage
                classifyCurrentImage()

            }
        }
        
        // MARK: - Vision classification
        
        // Vision classification request and model
        /// - Tag: ClassificationRequest
        private lazy var classificationRequest: VNCoreMLRequest = {
                // Instantiate the model from its generated Swift class.
                let request = VNCoreMLRequest(model: parent.model, completionHandler: { [weak self] request, error in
                    self?.processClassifications(for: request, error: error)
                })
                
                // Crop input images to square area at center, matching the way the ML model was trained.
                request.imageCropAndScaleOption = .scaleFill
                
                // Use CPU for Vision processing to ensure that there are adequate GPU resources for rendering.
                request.usesCPUOnly = true
                
                return request
        }()
        
        // The pixel buffer being held for analysis; used to serialize Vision requests.
        private var currentBuffer: CVPixelBuffer?
        
        // Queue for dispatching vision classification requests
        private let visionQueue = DispatchQueue(label: "com.alelin.Find-My-Apple-Remote.ARKitVision.serialVisionQueue")
        
        // Run the Vision+ML classifier on the current image buffer.
        /// - Tag: ClassifyCurrentImage
        private func classifyCurrentImage() {
            guard let currentBuffer = currentBuffer else {
                return
            }

            // Most computer vision tasks are not rotation agnostic so it is important to pass in the orientation of the image with respect to device.
            // This is an extension on CGImagePropertyOrientation
            let orientation = CGImagePropertyOrientation(UIDevice.current.orientation)
            
            let input = AppleRemoteDetectorInput(image: currentBuffer)
            
            parent.model.featureProvider = input
            let requestHandler = VNImageRequestHandler(cvPixelBuffer: currentBuffer, orientation: orientation, options: [:])
            
            visionQueue.async {
                do {
                    // Release the pixel buffer when done, allowing the next buffer to be processed.
                    defer { self.currentBuffer = nil }
                    try requestHandler.perform([self.classificationRequest])
                } catch {
                    print("Error: Vision request failed with error \"\(error)\"")
                }
            }
        }
        
        // Handle completion of the Vision request and choose results to display.
        /// - Tag: ProcessClassifications
        func processClassifications(for request: VNRequest, error: Error?) {
            guard
                let results = request.results,
                !results.isEmpty,
                let recognizedObjectObservations = results as? [VNRecognizedObjectObservation],
                let recognizedObjectObservation = recognizedObjectObservations.first,
                let bestResult = recognizedObjectObservation.labels.first(where: { result in result.confidence > 0.5 }),
                let label = bestResult.identifier.split(separator: ",").first
            else {
                self.parent.boundingBox = nil
                self.parent.success = false
                if let error = error {
                    print("Unable to classify image.\n\(error.localizedDescription)")
                }
                return
            }
            self.parent.success = true
            print("\(recognizedObjectObservation.boundingBox)")
            self.parent.boundingBox = recognizedObjectObservation.boundingBox

            // Show a label for the highest-confidence result (but only above a minimum confidence threshold).
            let confidence = String(format: "%.0f", bestResult.confidence * 100)
            let labelString = String(label)
            parent.message = "\(labelString) at \(confidence)"
        }
                
        func session(_ session: ARSession, didFailWithError error: Error) {
            guard error is ARError else { return }
            
            let errorWithInfo = error as NSError
            let messages = [
                errorWithInfo.localizedDescription,
                errorWithInfo.localizedFailureReason,
                errorWithInfo.localizedRecoverySuggestion
            ]
            
            // Filter out optional error messages.
            let errorMessage = messages.compactMap({ [=11=] }).joined(separator: "\n")
            DispatchQueue.main.async {
                self.parent.message = "The AR session failed with error: \(errorMessage)"
            }
        }
    }
}

使用 Vision 和 RealityKit 逆时针旋转并扭曲（拉伸？）视频

Using Vision and RealityKit Rotates Counterclockwise and Distorts(Stretches?) Video

swift

arkit

coreml

realitykit

visionkit