使用 Vision 和 RealityKit 逆时针旋转并扭曲(拉伸?)视频
Using Vision and RealityKit Rotates Counterclockwise and Distorts(Stretches?) Video
我正在尝试学习iOS中的对象检测,然后标记检测到的对象的位置。我在项目中训练并安装了模型。我的下一步是在屏幕上显示 AR 视图。那是有效的。当我通过按钮打开我的视觉处理代码时,我最终看到屏幕上的图像旋转和扭曲(很可能只是由于倒轴而拉伸)。
我找到了一个帮助指导我的部分教程,他们似乎 运行 解决了这个问题,解决了它,但没有显示解决方案。我无法联系到作者。作者的评论是:one slightly tricky aspect to this was that the coordinate system returned from Vision was different than SwiftUI’s coordinate system (normalized and the y-axis was flipped), but some simple transformations did the trick.
我不知道它们是哪些简单的转换,但我怀疑它们与 simd 相关。如果有人对此有所了解,我将不胜感激解决旋转和扭曲问题。
Vision 启动后,控制台中确实出现了错误代码:
与此类似的消息:
2022-05-12 21:14:39.142550-0400 Find My Apple Remote[66143:9990936] [Assets] Resolving material name 'engine:BuiltinRenderGraphResources/AR/arInPlacePostProcessCombinedPermute7.rematerial' as an asset path -- this usage is deprecated; instead provide a valid bundle
2022-05-12 21:14:39.270684-0400 Find My Apple Remote[66143:9991089] [Session] ARSession <0x111743970>: ARSessionDelegate is retaining 11 ARFrames. This can lead to future camera frames being dropped.
2022-05-12 21:14:40.121810-0400 Find My Apple Remote[66143:9991117] [CAMetalLayer nextDrawable] returning nil because allocation failed.
我最关心的是最后一个。
到目前为止,我的代码是:
struct ContentView : View {
@State private var isDetecting = false
@State private var success = false
var body: some View {
VStack {
RealityKitView(isDetecting: $isDetecting, success: $success)
.overlay(alignment: .top) {
Image(systemName: (success ? "checkmark.circle" : "slash.circle"))
.foregroundColor(success ? .green : .red)
}
Button {
isDetecting.toggle()
} label: {
Text(isDetecting ? "Stop Detecting" : "Start Detecting")
.frame(width: 150, height: 50)
.background(
Capsule()
.fill(isDetecting ? Color.red.opacity(0.5) : Color.green.opacity(0.5))
)
}
}
}
}
import SwiftUI
import ARKit
import RealityKit
import Vision
struct RealityKitView: UIViewRepresentable {
let arView = ARView()
let scale = SIMD3<Float>(repeating: 0.1)
let model: VNCoreMLModel? = RealityKitView.returnMLModel()
@Binding var isDetecting: Bool
@Binding var success: Bool
@State var boundingBox: CGRect?
func makeUIView(context: Context) -> some UIView {
// Start AR Session
let session = configureSession()
// Handle ARSession events via delegate
session.delegate = context.coordinator
return arView
}
func configureSession() -> ARSession {
let session = arView.session
let config = ARWorldTrackingConfiguration()
config.planeDetection = [.horizontal, .vertical]
config.environmentTexturing = .automatic
session.run(config)
return session
}
static func returnMLModel() -> VNCoreMLModel? {
do {
let detector = try AppleRemoteDetector()
let model = try VNCoreMLModel(for: detector.model)
return model
} catch {
print("RealityKitView:returnMLModel failed with error: \(error)")
}
return nil
}
func updateUIView(_ uiView: UIViewType, context: Context) {}
func makeCoordinator() -> Coordinator {
Coordinator(self)
}
class Coordinator: NSObject, ARSessionDelegate {
var parent: RealityKitView
init(_ parent: RealityKitView) {
self.parent = parent
}
func session(_ session: ARSession, didUpdate frame: ARFrame) {
// Start vision processing
if parent.isDetecting {
guard let model = parent.model else {
return
}
// I suspect the problem is here where the image is captured in a buffer, and then
// turned in to an input for the CoreML model.
let pixelBuffer = frame.capturedImage
let input = AppleRemoteDetectorInput(image: pixelBuffer)
do {
let request = VNCoreMLRequest(model: model) { (request, error) in
guard
let results = request.results,
!results.isEmpty,
let recognizedObjectObservation = results as? [VNRecognizedObjectObservation],
let first = recognizedObjectObservation.first
else {
self.parent.boundingBox = nil
self.parent.success = false
return
}
self.parent.success = true
print("\(first.boundingBox)")
self.parent.boundingBox = first.boundingBox
}
model.featureProvider = input
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: CGImagePropertyOrientation.right, options: [:])
try handler.perform([request])
} catch {
print(error)
}
}
}
}
}
经过几天的尝试,通过研究和更多研究,我发现 提供了解决方案。请注意,这两个答案都是有效的,只是取决于您的应用程序的结构。
问题的症结在于导致 RealityKitView
的状态变化导致 ARView
变为 re-instantiated。然而,这一次,它被实例化为 0 的大小,这就是导致错误 [CAMetalLayer nextDrawable] returning nil because allocation failed
的原因,因为这导致它 return nil。但是,用这样的大小初始化它:
let arView = ARView(frame: .init(x: 1, y: 1, width: 1, height: 1), cameraMode: .ar, automaticallyConfigureSession: false)
解决了这个问题。
为了将来尝试这样做的人,这里是当前工作 UIViewRepresentable
:
import SwiftUI
import ARKit
import RealityKit
import Vision
struct RealityKitView: UIViewRepresentable {
let arView = ARView(frame: .init(x: 1, y: 1, width: 1, height: 1), cameraMode: .ar, automaticallyConfigureSession: false)
// Making this implicity unwrapped. If this fails, the app should crash anyway...
let model: VNCoreMLModel! = RealityKitView.returnMLModel()
@Binding var isDetecting: Bool // This turns Vision on and off
@Binding var success: Bool // This is the state of Vision's finding the object
@Binding var message: String // This allows different messages to be communicated to the user
@State var boundingBox: CGRect?
func makeUIView(context: Context) -> some UIView {
// Start AR Session
let session = configureSession()
// Add coaching overlay
addCoachingOverlay(session: session)
// Handle ARSession events via delegate
session.delegate = context.coordinator
return arView
}
func addCoachingOverlay(session: ARSession) {
let coachingOverlay = ARCoachingOverlayView()
coachingOverlay.autoresizingMask = [.flexibleWidth, .flexibleHeight]
coachingOverlay.session = session
coachingOverlay.goal = .horizontalPlane
arView.addSubview(coachingOverlay)
}
func configureSession() -> ARSession {
let session = arView.session
let config = ARWorldTrackingConfiguration()
config.planeDetection = [.horizontal, .vertical]
config.environmentTexturing = .automatic
session.run(config)
return session
}
static func returnMLModel() -> VNCoreMLModel? {
do {
let config = MLModelConfiguration()
config.computeUnits = .all
let detector = try AppleRemoteDetector()
let model = try VNCoreMLModel(for: detector.model)
return model
} catch {
print("RealityKitView:returnMLModel failed with error: \(error)")
}
return nil
}
func updateUIView(_ uiView: UIViewType, context: Context) {
}
func makeCoordinator() -> Coordinator {
Coordinator(self)
}
class Coordinator: NSObject, ARSessionDelegate {
var parent: RealityKitView
init(_ parent: RealityKitView) {
self.parent = parent
}
func session(_ session: ARSession, didUpdate frame: ARFrame) {
if parent.isDetecting {
// Do not enqueue other buffers for processing while another Vision task is still running.
// The camera stream has only a finite amount of buffers available; holding too many buffers for analysis would starve the camera.
guard currentBuffer == nil, case .normal = frame.camera.trackingState else {
return
}
// Retain the image buffer for Vision processing.
self.currentBuffer = frame.capturedImage
classifyCurrentImage()
}
}
// MARK: - Vision classification
// Vision classification request and model
/// - Tag: ClassificationRequest
private lazy var classificationRequest: VNCoreMLRequest = {
// Instantiate the model from its generated Swift class.
let request = VNCoreMLRequest(model: parent.model, completionHandler: { [weak self] request, error in
self?.processClassifications(for: request, error: error)
})
// Crop input images to square area at center, matching the way the ML model was trained.
request.imageCropAndScaleOption = .scaleFill
// Use CPU for Vision processing to ensure that there are adequate GPU resources for rendering.
request.usesCPUOnly = true
return request
}()
// The pixel buffer being held for analysis; used to serialize Vision requests.
private var currentBuffer: CVPixelBuffer?
// Queue for dispatching vision classification requests
private let visionQueue = DispatchQueue(label: "com.alelin.Find-My-Apple-Remote.ARKitVision.serialVisionQueue")
// Run the Vision+ML classifier on the current image buffer.
/// - Tag: ClassifyCurrentImage
private func classifyCurrentImage() {
guard let currentBuffer = currentBuffer else {
return
}
// Most computer vision tasks are not rotation agnostic so it is important to pass in the orientation of the image with respect to device.
// This is an extension on CGImagePropertyOrientation
let orientation = CGImagePropertyOrientation(UIDevice.current.orientation)
let input = AppleRemoteDetectorInput(image: currentBuffer)
parent.model.featureProvider = input
let requestHandler = VNImageRequestHandler(cvPixelBuffer: currentBuffer, orientation: orientation, options: [:])
visionQueue.async {
do {
// Release the pixel buffer when done, allowing the next buffer to be processed.
defer { self.currentBuffer = nil }
try requestHandler.perform([self.classificationRequest])
} catch {
print("Error: Vision request failed with error \"\(error)\"")
}
}
}
// Handle completion of the Vision request and choose results to display.
/// - Tag: ProcessClassifications
func processClassifications(for request: VNRequest, error: Error?) {
guard
let results = request.results,
!results.isEmpty,
let recognizedObjectObservations = results as? [VNRecognizedObjectObservation],
let recognizedObjectObservation = recognizedObjectObservations.first,
let bestResult = recognizedObjectObservation.labels.first(where: { result in result.confidence > 0.5 }),
let label = bestResult.identifier.split(separator: ",").first
else {
self.parent.boundingBox = nil
self.parent.success = false
if let error = error {
print("Unable to classify image.\n\(error.localizedDescription)")
}
return
}
self.parent.success = true
print("\(recognizedObjectObservation.boundingBox)")
self.parent.boundingBox = recognizedObjectObservation.boundingBox
// Show a label for the highest-confidence result (but only above a minimum confidence threshold).
let confidence = String(format: "%.0f", bestResult.confidence * 100)
let labelString = String(label)
parent.message = "\(labelString) at \(confidence)"
}
func session(_ session: ARSession, didFailWithError error: Error) {
guard error is ARError else { return }
let errorWithInfo = error as NSError
let messages = [
errorWithInfo.localizedDescription,
errorWithInfo.localizedFailureReason,
errorWithInfo.localizedRecoverySuggestion
]
// Filter out optional error messages.
let errorMessage = messages.compactMap({ [=11=] }).joined(separator: "\n")
DispatchQueue.main.async {
self.parent.message = "The AR session failed with error: \(errorMessage)"
}
}
}
}
我正在尝试学习iOS中的对象检测,然后标记检测到的对象的位置。我在项目中训练并安装了模型。我的下一步是在屏幕上显示 AR 视图。那是有效的。当我通过按钮打开我的视觉处理代码时,我最终看到屏幕上的图像旋转和扭曲(很可能只是由于倒轴而拉伸)。
我找到了一个帮助指导我的部分教程,他们似乎 运行 解决了这个问题,解决了它,但没有显示解决方案。我无法联系到作者。作者的评论是:one slightly tricky aspect to this was that the coordinate system returned from Vision was different than SwiftUI’s coordinate system (normalized and the y-axis was flipped), but some simple transformations did the trick.
我不知道它们是哪些简单的转换,但我怀疑它们与 simd 相关。如果有人对此有所了解,我将不胜感激解决旋转和扭曲问题。
Vision 启动后,控制台中确实出现了错误代码:
与此类似的消息:
2022-05-12 21:14:39.142550-0400 Find My Apple Remote[66143:9990936] [Assets] Resolving material name 'engine:BuiltinRenderGraphResources/AR/arInPlacePostProcessCombinedPermute7.rematerial' as an asset path -- this usage is deprecated; instead provide a valid bundle
2022-05-12 21:14:39.270684-0400 Find My Apple Remote[66143:9991089] [Session] ARSession <0x111743970>: ARSessionDelegate is retaining 11 ARFrames. This can lead to future camera frames being dropped.
2022-05-12 21:14:40.121810-0400 Find My Apple Remote[66143:9991117] [CAMetalLayer nextDrawable] returning nil because allocation failed.
我最关心的是最后一个。
到目前为止,我的代码是:
struct ContentView : View {
@State private var isDetecting = false
@State private var success = false
var body: some View {
VStack {
RealityKitView(isDetecting: $isDetecting, success: $success)
.overlay(alignment: .top) {
Image(systemName: (success ? "checkmark.circle" : "slash.circle"))
.foregroundColor(success ? .green : .red)
}
Button {
isDetecting.toggle()
} label: {
Text(isDetecting ? "Stop Detecting" : "Start Detecting")
.frame(width: 150, height: 50)
.background(
Capsule()
.fill(isDetecting ? Color.red.opacity(0.5) : Color.green.opacity(0.5))
)
}
}
}
}
import SwiftUI
import ARKit
import RealityKit
import Vision
struct RealityKitView: UIViewRepresentable {
let arView = ARView()
let scale = SIMD3<Float>(repeating: 0.1)
let model: VNCoreMLModel? = RealityKitView.returnMLModel()
@Binding var isDetecting: Bool
@Binding var success: Bool
@State var boundingBox: CGRect?
func makeUIView(context: Context) -> some UIView {
// Start AR Session
let session = configureSession()
// Handle ARSession events via delegate
session.delegate = context.coordinator
return arView
}
func configureSession() -> ARSession {
let session = arView.session
let config = ARWorldTrackingConfiguration()
config.planeDetection = [.horizontal, .vertical]
config.environmentTexturing = .automatic
session.run(config)
return session
}
static func returnMLModel() -> VNCoreMLModel? {
do {
let detector = try AppleRemoteDetector()
let model = try VNCoreMLModel(for: detector.model)
return model
} catch {
print("RealityKitView:returnMLModel failed with error: \(error)")
}
return nil
}
func updateUIView(_ uiView: UIViewType, context: Context) {}
func makeCoordinator() -> Coordinator {
Coordinator(self)
}
class Coordinator: NSObject, ARSessionDelegate {
var parent: RealityKitView
init(_ parent: RealityKitView) {
self.parent = parent
}
func session(_ session: ARSession, didUpdate frame: ARFrame) {
// Start vision processing
if parent.isDetecting {
guard let model = parent.model else {
return
}
// I suspect the problem is here where the image is captured in a buffer, and then
// turned in to an input for the CoreML model.
let pixelBuffer = frame.capturedImage
let input = AppleRemoteDetectorInput(image: pixelBuffer)
do {
let request = VNCoreMLRequest(model: model) { (request, error) in
guard
let results = request.results,
!results.isEmpty,
let recognizedObjectObservation = results as? [VNRecognizedObjectObservation],
let first = recognizedObjectObservation.first
else {
self.parent.boundingBox = nil
self.parent.success = false
return
}
self.parent.success = true
print("\(first.boundingBox)")
self.parent.boundingBox = first.boundingBox
}
model.featureProvider = input
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: CGImagePropertyOrientation.right, options: [:])
try handler.perform([request])
} catch {
print(error)
}
}
}
}
}
经过几天的尝试,通过研究和更多研究,我发现
问题的症结在于导致 RealityKitView
的状态变化导致 ARView
变为 re-instantiated。然而,这一次,它被实例化为 0 的大小,这就是导致错误 [CAMetalLayer nextDrawable] returning nil because allocation failed
的原因,因为这导致它 return nil。但是,用这样的大小初始化它:
let arView = ARView(frame: .init(x: 1, y: 1, width: 1, height: 1), cameraMode: .ar, automaticallyConfigureSession: false)
解决了这个问题。
为了将来尝试这样做的人,这里是当前工作 UIViewRepresentable
:
import SwiftUI
import ARKit
import RealityKit
import Vision
struct RealityKitView: UIViewRepresentable {
let arView = ARView(frame: .init(x: 1, y: 1, width: 1, height: 1), cameraMode: .ar, automaticallyConfigureSession: false)
// Making this implicity unwrapped. If this fails, the app should crash anyway...
let model: VNCoreMLModel! = RealityKitView.returnMLModel()
@Binding var isDetecting: Bool // This turns Vision on and off
@Binding var success: Bool // This is the state of Vision's finding the object
@Binding var message: String // This allows different messages to be communicated to the user
@State var boundingBox: CGRect?
func makeUIView(context: Context) -> some UIView {
// Start AR Session
let session = configureSession()
// Add coaching overlay
addCoachingOverlay(session: session)
// Handle ARSession events via delegate
session.delegate = context.coordinator
return arView
}
func addCoachingOverlay(session: ARSession) {
let coachingOverlay = ARCoachingOverlayView()
coachingOverlay.autoresizingMask = [.flexibleWidth, .flexibleHeight]
coachingOverlay.session = session
coachingOverlay.goal = .horizontalPlane
arView.addSubview(coachingOverlay)
}
func configureSession() -> ARSession {
let session = arView.session
let config = ARWorldTrackingConfiguration()
config.planeDetection = [.horizontal, .vertical]
config.environmentTexturing = .automatic
session.run(config)
return session
}
static func returnMLModel() -> VNCoreMLModel? {
do {
let config = MLModelConfiguration()
config.computeUnits = .all
let detector = try AppleRemoteDetector()
let model = try VNCoreMLModel(for: detector.model)
return model
} catch {
print("RealityKitView:returnMLModel failed with error: \(error)")
}
return nil
}
func updateUIView(_ uiView: UIViewType, context: Context) {
}
func makeCoordinator() -> Coordinator {
Coordinator(self)
}
class Coordinator: NSObject, ARSessionDelegate {
var parent: RealityKitView
init(_ parent: RealityKitView) {
self.parent = parent
}
func session(_ session: ARSession, didUpdate frame: ARFrame) {
if parent.isDetecting {
// Do not enqueue other buffers for processing while another Vision task is still running.
// The camera stream has only a finite amount of buffers available; holding too many buffers for analysis would starve the camera.
guard currentBuffer == nil, case .normal = frame.camera.trackingState else {
return
}
// Retain the image buffer for Vision processing.
self.currentBuffer = frame.capturedImage
classifyCurrentImage()
}
}
// MARK: - Vision classification
// Vision classification request and model
/// - Tag: ClassificationRequest
private lazy var classificationRequest: VNCoreMLRequest = {
// Instantiate the model from its generated Swift class.
let request = VNCoreMLRequest(model: parent.model, completionHandler: { [weak self] request, error in
self?.processClassifications(for: request, error: error)
})
// Crop input images to square area at center, matching the way the ML model was trained.
request.imageCropAndScaleOption = .scaleFill
// Use CPU for Vision processing to ensure that there are adequate GPU resources for rendering.
request.usesCPUOnly = true
return request
}()
// The pixel buffer being held for analysis; used to serialize Vision requests.
private var currentBuffer: CVPixelBuffer?
// Queue for dispatching vision classification requests
private let visionQueue = DispatchQueue(label: "com.alelin.Find-My-Apple-Remote.ARKitVision.serialVisionQueue")
// Run the Vision+ML classifier on the current image buffer.
/// - Tag: ClassifyCurrentImage
private func classifyCurrentImage() {
guard let currentBuffer = currentBuffer else {
return
}
// Most computer vision tasks are not rotation agnostic so it is important to pass in the orientation of the image with respect to device.
// This is an extension on CGImagePropertyOrientation
let orientation = CGImagePropertyOrientation(UIDevice.current.orientation)
let input = AppleRemoteDetectorInput(image: currentBuffer)
parent.model.featureProvider = input
let requestHandler = VNImageRequestHandler(cvPixelBuffer: currentBuffer, orientation: orientation, options: [:])
visionQueue.async {
do {
// Release the pixel buffer when done, allowing the next buffer to be processed.
defer { self.currentBuffer = nil }
try requestHandler.perform([self.classificationRequest])
} catch {
print("Error: Vision request failed with error \"\(error)\"")
}
}
}
// Handle completion of the Vision request and choose results to display.
/// - Tag: ProcessClassifications
func processClassifications(for request: VNRequest, error: Error?) {
guard
let results = request.results,
!results.isEmpty,
let recognizedObjectObservations = results as? [VNRecognizedObjectObservation],
let recognizedObjectObservation = recognizedObjectObservations.first,
let bestResult = recognizedObjectObservation.labels.first(where: { result in result.confidence > 0.5 }),
let label = bestResult.identifier.split(separator: ",").first
else {
self.parent.boundingBox = nil
self.parent.success = false
if let error = error {
print("Unable to classify image.\n\(error.localizedDescription)")
}
return
}
self.parent.success = true
print("\(recognizedObjectObservation.boundingBox)")
self.parent.boundingBox = recognizedObjectObservation.boundingBox
// Show a label for the highest-confidence result (but only above a minimum confidence threshold).
let confidence = String(format: "%.0f", bestResult.confidence * 100)
let labelString = String(label)
parent.message = "\(labelString) at \(confidence)"
}
func session(_ session: ARSession, didFailWithError error: Error) {
guard error is ARError else { return }
let errorWithInfo = error as NSError
let messages = [
errorWithInfo.localizedDescription,
errorWithInfo.localizedFailureReason,
errorWithInfo.localizedRecoverySuggestion
]
// Filter out optional error messages.
let errorMessage = messages.compactMap({ [=11=] }).joined(separator: "\n")
DispatchQueue.main.async {
self.parent.message = "The AR session failed with error: \(errorMessage)"
}
}
}
}