Azure 语音转文本 REST API V3 二进制数据
Azure speech to text REST API V3 binary data
我正在尝试使用 Azure 语音转文本服务。在文档中,我遇到了使用 V1 API 版本的示例:
https://$region.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1
基本上每个 link 的正确文档都是针对 V3 API.
https://{endpoint}/speechtotext/v3.0
在此 V1 示例中,您可以轻松地将文件作为 binary.
发送
curl --location --request POST \
"https://$region.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US" \
--header "Ocp-Apim-Subscription-Key: $key" \
--header "Content-Type: audio/wav" \
--data-binary $audio_file
但是我无法弄清楚如何提供一个wordLevelTimestampsEnabled=true
参数来获取单词级别的时间戳。
另一方面,我尝试使用 V3 API,我可以轻松提供 wordLevelTimestampsEnabled=true
参数,但我无法弄清楚如何发送二进制文件数据。
curl -L -X POST 'https://northeurope.api.cognitive.microsoft.com/speechtotext/v3.0/transcriptions' -H 'Content-Type: application/json' -H 'Accept: application/json' -H 'Ocp-Apim-Subscription-Key: $key' --data-raw '{
"contentUrls": [
"https://url-to-file.dev/test-file.wav"
],
"properties": {
"diarizationEnabled": false,
"wordLevelTimestampsEnabled": true,
"punctuationMode": "DictatedAndAutomatic",
"profanityFilterMode": "Masked"
},
"locale": "pl-PL",
"displayName": "Transcription using default model for pl-PL"
}'
有没有办法传递二进制文件并使用 wordLevelTimestampsEnabled=true
参数获取字级时间戳?
Is there a way to pass a binary file and also get word level timestamps with wordLevelTimestampsEnabled=true
parameter?
按照 Code Different 的建议,将评论转换为社区 wiki 答案,以帮助可能面临类似问题的社区成员。
根据documentation,无法直接上传二进制文件。您应该通过 contentUrls
属性.
提供 URL
例如:
{
"contentUrls": [
"<URL to an audio file to transcribe>",
],
"properties": {
"diarizationEnabled": false,
"wordLevelTimestampsEnabled": true,
"punctuationMode": "DictatedAndAutomatic",
"profanityFilterMode": "Masked"
},
"locale": "en-US",
"displayName": "Transcription of file using default model for en-US"
}
可以参考Speech-to-text REST API v3.0, cognitive-services-speech-sdk and Azure Speech Recognition - use binary / hexadecimal data instead of WAV file path
我正在尝试使用 Azure 语音转文本服务。在文档中,我遇到了使用 V1 API 版本的示例:
https://$region.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1
基本上每个 link 的正确文档都是针对 V3 API.
https://{endpoint}/speechtotext/v3.0
在此 V1 示例中,您可以轻松地将文件作为 binary.
发送curl --location --request POST \
"https://$region.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US" \
--header "Ocp-Apim-Subscription-Key: $key" \
--header "Content-Type: audio/wav" \
--data-binary $audio_file
但是我无法弄清楚如何提供一个wordLevelTimestampsEnabled=true
参数来获取单词级别的时间戳。
另一方面,我尝试使用 V3 API,我可以轻松提供 wordLevelTimestampsEnabled=true
参数,但我无法弄清楚如何发送二进制文件数据。
curl -L -X POST 'https://northeurope.api.cognitive.microsoft.com/speechtotext/v3.0/transcriptions' -H 'Content-Type: application/json' -H 'Accept: application/json' -H 'Ocp-Apim-Subscription-Key: $key' --data-raw '{
"contentUrls": [
"https://url-to-file.dev/test-file.wav"
],
"properties": {
"diarizationEnabled": false,
"wordLevelTimestampsEnabled": true,
"punctuationMode": "DictatedAndAutomatic",
"profanityFilterMode": "Masked"
},
"locale": "pl-PL",
"displayName": "Transcription using default model for pl-PL"
}'
有没有办法传递二进制文件并使用 wordLevelTimestampsEnabled=true
参数获取字级时间戳?
Is there a way to pass a binary file and also get word level timestamps with
wordLevelTimestampsEnabled=true
parameter?
按照 Code Different 的建议,将评论转换为社区 wiki 答案,以帮助可能面临类似问题的社区成员。
根据documentation,无法直接上传二进制文件。您应该通过 contentUrls
属性.
例如:
{
"contentUrls": [
"<URL to an audio file to transcribe>",
],
"properties": {
"diarizationEnabled": false,
"wordLevelTimestampsEnabled": true,
"punctuationMode": "DictatedAndAutomatic",
"profanityFilterMode": "Masked"
},
"locale": "en-US",
"displayName": "Transcription of file using default model for en-US"
}
可以参考Speech-to-text REST API v3.0, cognitive-services-speech-sdk and Azure Speech Recognition - use binary / hexadecimal data instead of WAV file path