多种语言的 Azure 语音到文本翻译
Azure Speech to Text Translations with multiple languages
我对 Azure 的语音 SDK 还很陌生,所以很可能我遗漏了一些明显的东西,如果是这样的话,我们深表歉意。
我一直在做一个项目,我想将音频 file/stream 从一种语言翻译成另一种语言。当他们的整个对话都使用一种语言(全是西班牙语)时,它工作得很好,但当我给它提供一个包含英语和西班牙语的真实对话时,它就崩溃了。它会尝试将英语单词识别为西班牙语单词(因此它会将“对不起”之类的内容转录为错乱的西班牙语)。
据我所知,您可以设置多种目标语言(要翻译成的语言),但只能设置一种 speechRecognitionLanguage。这似乎意味着它无法处理存在多种语言的对话(例如与翻译的 phone 通话)或说话者在不同语言之间切换。有没有办法让它与多种语言一起工作,或者这只是微软还没有完全解决的问题?
这是我现在拥有的代码(它只是他们 github 上示例的轻微修改版本):
// pull in the required packages.
var sdk = require("microsoft-cognitiveservices-speech-sdk");
(function() {
"use strict";
module.exports = {
main: function(settings, audioStream) {
// now create the audio-config pointing to our stream and
// the speech config specifying the language.
var audioConfig = sdk.AudioConfig.fromStreamInput(audioStream);
var translationConfig = sdk.SpeechTranslationConfig.fromSubscription(settings.subscriptionKey, settings.serviceRegion);
// setting the recognition language.
translationConfig.speechRecognitionLanguage = settings.language;
// target language (to be translated to).
translationConfig.addTargetLanguage("en");
// create the translation recognizer.
var recognizer = new sdk.TranslationRecognizer(translationConfig, audioConfig);
recognizer.recognized = function (s, e) {
if (e.result.reason === sdk.ResultReason.NoMatch) {
var noMatchDetail = sdk.NoMatchDetails.fromResult(e.result);
console.log("\r\nDidn't find a match: " + sdk.NoMatchReason[noMatchDetail.reason]);
} else {
var str = "\r\nNext Line: " + e.result.text + "\nTranslations:";
var language = "en";
str += " [" + language + "] " + e.result.translations.get(language);
str += "\r\n";
console.log(str);
}
};
//two possible states, Error or EndOfStream
recognizer.canceled = function (s, e) {
var str = "(cancel) Reason: " + sdk.CancellationReason[e.reason];
//if it was because of an error
if (e.reason === sdk.CancellationReason.Error) {
str += ": " + e.errorDetails;
console.log(str);
}
//We've reached the end of the file, stop the recognizer
else {
recognizer.stopContinuousRecognitionAsync(function() {
console.log("End of file.");
recognizer.close();
recognizer = undefined;
},
function(err) {
console.trace("err - " + err);
recognizer.close();
recognizer = undefined;
})
}
};
// start the recognizer and wait for a result.
recognizer.startContinuousRecognitionAsync(
function () {
console.log("Starting speech recognition");
},
function (err) {
console.trace("err - " + err);
recognizer.close();
recognizer = undefined;
}
);
}
}
}());
根据官方文档Language and region support for the Speech Services
的Speech translation
部分,如下所示,我认为您可以使用Speech translation
代替Speech-To-text
来实现您的需求。
Speech translation
The Speech Translation API supports different
languages for speech-to-speech and speech-to-text translation. The
source language must always be from the Speech-to-Text language table.
The available target languages depend on whether the translation
target is speech or text. You may translate incoming speech into more
than 60 languages. A subset of these languages are available for
speech synthesis.
同时,有Speech translation
的官方示例代码Azure-Samples/cognitive-services-speech-sdk/samples/js/node/translation.js
。
我不会说西班牙语,所以我不能帮您测试英语和西班牙语的音频。
希望对您有所帮助。
截至目前(8 月),语音 SDK 翻译支持将一种输入语言翻译成多种输出语言。
正在开发支持口语识别的服务。这些将使我们能够 运行 从多种输入语言翻译成多种输出语言(您将在配置中指定的两种语言)。尚无可用性的 ETA ...
沃尔夫冈
我对 Azure 的语音 SDK 还很陌生,所以很可能我遗漏了一些明显的东西,如果是这样的话,我们深表歉意。
我一直在做一个项目,我想将音频 file/stream 从一种语言翻译成另一种语言。当他们的整个对话都使用一种语言(全是西班牙语)时,它工作得很好,但当我给它提供一个包含英语和西班牙语的真实对话时,它就崩溃了。它会尝试将英语单词识别为西班牙语单词(因此它会将“对不起”之类的内容转录为错乱的西班牙语)。
据我所知,您可以设置多种目标语言(要翻译成的语言),但只能设置一种 speechRecognitionLanguage。这似乎意味着它无法处理存在多种语言的对话(例如与翻译的 phone 通话)或说话者在不同语言之间切换。有没有办法让它与多种语言一起工作,或者这只是微软还没有完全解决的问题?
这是我现在拥有的代码(它只是他们 github 上示例的轻微修改版本):
// pull in the required packages.
var sdk = require("microsoft-cognitiveservices-speech-sdk");
(function() {
"use strict";
module.exports = {
main: function(settings, audioStream) {
// now create the audio-config pointing to our stream and
// the speech config specifying the language.
var audioConfig = sdk.AudioConfig.fromStreamInput(audioStream);
var translationConfig = sdk.SpeechTranslationConfig.fromSubscription(settings.subscriptionKey, settings.serviceRegion);
// setting the recognition language.
translationConfig.speechRecognitionLanguage = settings.language;
// target language (to be translated to).
translationConfig.addTargetLanguage("en");
// create the translation recognizer.
var recognizer = new sdk.TranslationRecognizer(translationConfig, audioConfig);
recognizer.recognized = function (s, e) {
if (e.result.reason === sdk.ResultReason.NoMatch) {
var noMatchDetail = sdk.NoMatchDetails.fromResult(e.result);
console.log("\r\nDidn't find a match: " + sdk.NoMatchReason[noMatchDetail.reason]);
} else {
var str = "\r\nNext Line: " + e.result.text + "\nTranslations:";
var language = "en";
str += " [" + language + "] " + e.result.translations.get(language);
str += "\r\n";
console.log(str);
}
};
//two possible states, Error or EndOfStream
recognizer.canceled = function (s, e) {
var str = "(cancel) Reason: " + sdk.CancellationReason[e.reason];
//if it was because of an error
if (e.reason === sdk.CancellationReason.Error) {
str += ": " + e.errorDetails;
console.log(str);
}
//We've reached the end of the file, stop the recognizer
else {
recognizer.stopContinuousRecognitionAsync(function() {
console.log("End of file.");
recognizer.close();
recognizer = undefined;
},
function(err) {
console.trace("err - " + err);
recognizer.close();
recognizer = undefined;
})
}
};
// start the recognizer and wait for a result.
recognizer.startContinuousRecognitionAsync(
function () {
console.log("Starting speech recognition");
},
function (err) {
console.trace("err - " + err);
recognizer.close();
recognizer = undefined;
}
);
}
}
}());
根据官方文档Language and region support for the Speech Services
的Speech translation
部分,如下所示,我认为您可以使用Speech translation
代替Speech-To-text
来实现您的需求。
Speech translation
The Speech Translation API supports different languages for speech-to-speech and speech-to-text translation. The source language must always be from the Speech-to-Text language table. The available target languages depend on whether the translation target is speech or text. You may translate incoming speech into more than 60 languages. A subset of these languages are available for speech synthesis.
同时,有Speech translation
的官方示例代码Azure-Samples/cognitive-services-speech-sdk/samples/js/node/translation.js
。
我不会说西班牙语,所以我不能帮您测试英语和西班牙语的音频。
希望对您有所帮助。
截至目前(8 月),语音 SDK 翻译支持将一种输入语言翻译成多种输出语言。
正在开发支持口语识别的服务。这些将使我们能够 运行 从多种输入语言翻译成多种输出语言(您将在配置中指定的两种语言)。尚无可用性的 ETA ...
沃尔夫冈