同时使用 SFSpeechRecognizer 和 AVSpeechSythesizer 时如何正确设置 AVAudioSession 和 AVAudioEngine
How to correctly set up AVAudioSession and AVAudioEngine when using both SFSpeechRecognizer and AVSpeechSythesizer
我正在尝试创建一个同时利用 STT(语音到文本)和 TTS(文本到语音)的应用程序。但是,我 运行 遇到了一些模糊的问题,非常感谢您的专业知识。
该应用程序包含一个位于屏幕中央的按钮,单击该按钮可使用以下代码启动所需的语音识别功能。
// MARK: - Constant Properties
let audioEngine = AVAudioEngine()
// MARK: - Optional Properties
var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
var recognitionTask: SFSpeechRecognitionTask?
var speechRecognizer: SFSpeechRecognizer?
// MARK: - Functions
internal func startSpeechRecognition() {
// Instantiate the recognitionRequest property.
self.recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
// Set up the audio session.
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(.record, mode: .measurement, options: [.defaultToSpeaker, .duckOthers])
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
} catch {
print("An error has occurred while setting the AVAudioSession.")
}
// Set up the audio input tap.
let inputNode = self.audioEngine.inputNode
let inputNodeFormat = inputNode.outputFormat(forBus: 0)
self.audioEngine.inputNode.installTap(onBus: 0, bufferSize: 512, format: inputNodeFormat, block: { [unowned self] buffer, time in
self.recognitionRequest?.append(buffer)
})
// Start the recognition task.
guard
let speechRecognizer = self.speechRecognizer,
let recognitionRequest = self.recognitionRequest else {
fatalError("One or more properties could not be instantiated.")
}
self.recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest, resultHandler: { [unowned self] result, error in
if error != nil {
// Stop the audio engine and recognition task.
self.stopSpeechRecognition()
} else if let result = result {
let bestTranscriptionString = result.bestTranscription.formattedString
self.command = bestTranscriptionString
print(bestTranscriptionString)
}
})
// Start the audioEngine.
do {
try self.audioEngine.start()
} catch {
print("Could not start the audioEngine property.")
}
}
internal func stopSpeechRecognition() {
// Stop the audio engine.
self.audioEngine.stop()
self.audioEngine.inputNode.removeTap(onBus: 0)
// End and deallocate the recognition request.
self.recognitionRequest?.endAudio()
self.recognitionRequest = nil
// Cancel and deallocate the recognition task.
self.recognitionTask?.cancel()
self.recognitionTask = nil
}
当单独使用时,这段代码就像一个魅力。但是,当我想使用 AVSpeechSynthesizer
对象阅读转录的文本时,似乎没有什么是清楚的。
我查看了多个 Stack Overflow 帖子的建议,建议修改
audioSession.setCategory(.record, mode: .measurement, options: [.defaultToSpeaker, .duckOthers])
给以下
audioSession.setCategory(.playAndRecord, mode: .default, options: [.defaultToSpeaker, .duckOthers])
还是徒劳。在分别 运行 STT 和 TTS 之后,应用程序仍然崩溃。
解决方案是让我使用这个而不是前面提到的
audioSession.setCategory(.multiRoute, mode: .default, options: [.defaultToSpeaker, .duckOthers])
这让我完全不知所措,因为我真的不知道发生了什么错综复杂的事情。如果有任何相关解释,我将不胜感激!
我也在开发一个同时包含 SFSpeechRecognizer 和 AVSpeechSythesizer 的应用程序,对我来说 .setCategory(.playAndRecord, mode: .default)
工作正常,它是满足我们需求的最佳类别,according to Apple。甚至,我能够 .speak()
SFSpeechRecognitionTask 的每个转录,而音频引擎是 运行 没有任何问题。我的意见是你的程序逻辑中的某个地方导致了崩溃。如果您可以用相应的错误更新您的问题,那就太好了。
关于 .multiRoute
类别为何有效:我猜 AVAudioInputNode
有问题。如果您在控制台中看到这样的错误
Terminating app due to uncaught exception 'com.apple.coreaudio.avfaudio', reason: 'required condition is false: IsFormatSampleRateAndChannelCountValid(hwFormat)
或者像这样
Terminating app due to uncaught exception 'com.apple.coreaudio.avfaudio', reason: 'required condition is false: nullptr == Tap()
您只需要对代码的某些部分重新排序,例如将音频会话的设置移动到只被调用一次的地方,或者确保输入节点的抽头总是在安装 新的之前删除,即使识别任务是否成功完成也是如此。也许(我从未使用过它).multiRoute
能够通过 its nature 使用不同的音频流和路由来重用相同的输入节点。
我按照 Apple 的 WWDC session:
在我的程序中使用的逻辑保留在下面
设置类别
override func viewDidLoad() { //or init() or when necessarily
super.viewDidLoad()
try? AVAudioSession.sharedInstance().setCategory(.playAndRecord, mode: .default)
}
Validations/permissions
func shouldProcessSpeechRecognition() {
guard AVAudioSession.sharedInstance().recordPermission == .granted,
speechRecognizerAuthorizationStatus == .authorized,
let speechRecognizer = speechRecognizer, speechRecognizer.isAvailable else { return }
//Continue only if we have authorization and recognizer is available
startSpeechRecognition()
}
正在启动 STT
func startSpeechRecognition() {
let format = audioEngine.inputNode.outputFormat(forBus: 0)
audioEngine.inputNode.installTap(onBus: 0, bufferSize: 1024, format: format) { [unowned self] (buffer, _) in
self.recognitionRequest.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
recognitionTask = speechRecognizer!.recognitionTask(with: recognitionRequest, resultHandler: {...}
} catch {...}
}
结束 STT
func endSpeechRecognition() {
recognitionTask?.finish()
stopAudioEngine()
}
正在取消 STT
func cancelSpeechRecognition() {
recognitionTask?.cancel()
stopAudioEngine()
}
正在停止音频引擎
func stopAudioEngine() {
audioEngine.stop()
audioEngine.inputNode.removeTap(onBus: 0)
recognitionRequest.endAudio()
}
这样,我可以在我的代码中的任何地方调用一个 AVSpeechSynthesizer
实例并说出一句话。
我正在尝试创建一个同时利用 STT(语音到文本)和 TTS(文本到语音)的应用程序。但是,我 运行 遇到了一些模糊的问题,非常感谢您的专业知识。
该应用程序包含一个位于屏幕中央的按钮,单击该按钮可使用以下代码启动所需的语音识别功能。
// MARK: - Constant Properties
let audioEngine = AVAudioEngine()
// MARK: - Optional Properties
var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
var recognitionTask: SFSpeechRecognitionTask?
var speechRecognizer: SFSpeechRecognizer?
// MARK: - Functions
internal func startSpeechRecognition() {
// Instantiate the recognitionRequest property.
self.recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
// Set up the audio session.
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(.record, mode: .measurement, options: [.defaultToSpeaker, .duckOthers])
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
} catch {
print("An error has occurred while setting the AVAudioSession.")
}
// Set up the audio input tap.
let inputNode = self.audioEngine.inputNode
let inputNodeFormat = inputNode.outputFormat(forBus: 0)
self.audioEngine.inputNode.installTap(onBus: 0, bufferSize: 512, format: inputNodeFormat, block: { [unowned self] buffer, time in
self.recognitionRequest?.append(buffer)
})
// Start the recognition task.
guard
let speechRecognizer = self.speechRecognizer,
let recognitionRequest = self.recognitionRequest else {
fatalError("One or more properties could not be instantiated.")
}
self.recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest, resultHandler: { [unowned self] result, error in
if error != nil {
// Stop the audio engine and recognition task.
self.stopSpeechRecognition()
} else if let result = result {
let bestTranscriptionString = result.bestTranscription.formattedString
self.command = bestTranscriptionString
print(bestTranscriptionString)
}
})
// Start the audioEngine.
do {
try self.audioEngine.start()
} catch {
print("Could not start the audioEngine property.")
}
}
internal func stopSpeechRecognition() {
// Stop the audio engine.
self.audioEngine.stop()
self.audioEngine.inputNode.removeTap(onBus: 0)
// End and deallocate the recognition request.
self.recognitionRequest?.endAudio()
self.recognitionRequest = nil
// Cancel and deallocate the recognition task.
self.recognitionTask?.cancel()
self.recognitionTask = nil
}
当单独使用时,这段代码就像一个魅力。但是,当我想使用 AVSpeechSynthesizer
对象阅读转录的文本时,似乎没有什么是清楚的。
我查看了多个 Stack Overflow 帖子的建议,建议修改
audioSession.setCategory(.record, mode: .measurement, options: [.defaultToSpeaker, .duckOthers])
给以下
audioSession.setCategory(.playAndRecord, mode: .default, options: [.defaultToSpeaker, .duckOthers])
还是徒劳。在分别 运行 STT 和 TTS 之后,应用程序仍然崩溃。
解决方案是让我使用这个而不是前面提到的
audioSession.setCategory(.multiRoute, mode: .default, options: [.defaultToSpeaker, .duckOthers])
这让我完全不知所措,因为我真的不知道发生了什么错综复杂的事情。如果有任何相关解释,我将不胜感激!
我也在开发一个同时包含 SFSpeechRecognizer 和 AVSpeechSythesizer 的应用程序,对我来说 .setCategory(.playAndRecord, mode: .default)
工作正常,它是满足我们需求的最佳类别,according to Apple。甚至,我能够 .speak()
SFSpeechRecognitionTask 的每个转录,而音频引擎是 运行 没有任何问题。我的意见是你的程序逻辑中的某个地方导致了崩溃。如果您可以用相应的错误更新您的问题,那就太好了。
关于 .multiRoute
类别为何有效:我猜 AVAudioInputNode
有问题。如果您在控制台中看到这样的错误
Terminating app due to uncaught exception 'com.apple.coreaudio.avfaudio', reason: 'required condition is false: IsFormatSampleRateAndChannelCountValid(hwFormat)
或者像这样
Terminating app due to uncaught exception 'com.apple.coreaudio.avfaudio', reason: 'required condition is false: nullptr == Tap()
您只需要对代码的某些部分重新排序,例如将音频会话的设置移动到只被调用一次的地方,或者确保输入节点的抽头总是在安装 新的之前删除,即使识别任务是否成功完成也是如此。也许(我从未使用过它).multiRoute
能够通过 its nature 使用不同的音频流和路由来重用相同的输入节点。
我按照 Apple 的 WWDC session:
在我的程序中使用的逻辑保留在下面设置类别
override func viewDidLoad() { //or init() or when necessarily
super.viewDidLoad()
try? AVAudioSession.sharedInstance().setCategory(.playAndRecord, mode: .default)
}
Validations/permissions
func shouldProcessSpeechRecognition() {
guard AVAudioSession.sharedInstance().recordPermission == .granted,
speechRecognizerAuthorizationStatus == .authorized,
let speechRecognizer = speechRecognizer, speechRecognizer.isAvailable else { return }
//Continue only if we have authorization and recognizer is available
startSpeechRecognition()
}
正在启动 STT
func startSpeechRecognition() {
let format = audioEngine.inputNode.outputFormat(forBus: 0)
audioEngine.inputNode.installTap(onBus: 0, bufferSize: 1024, format: format) { [unowned self] (buffer, _) in
self.recognitionRequest.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
recognitionTask = speechRecognizer!.recognitionTask(with: recognitionRequest, resultHandler: {...}
} catch {...}
}
结束 STT
func endSpeechRecognition() {
recognitionTask?.finish()
stopAudioEngine()
}
正在取消 STT
func cancelSpeechRecognition() {
recognitionTask?.cancel()
stopAudioEngine()
}
正在停止音频引擎
func stopAudioEngine() {
audioEngine.stop()
audioEngine.inputNode.removeTap(onBus: 0)
recognitionRequest.endAudio()
}
这样,我可以在我的代码中的任何地方调用一个 AVSpeechSynthesizer
实例并说出一句话。