Microsoft Speech to Text Python SDK SPXERR_INVALID_HEADER 问题
Microsoft Speech to Text Python SDK SPXERR_INVALID_HEADER issue
我在使用 Microsoft Python Speech-to-Text Quickstart ("Quickstart: Recognize speech from an audio file") with the azure-cognitiveservices-speech v1.8.0 SDK 时遇到以下错误。
RuntimeError: Exception with an error code: 0xa (SPXERR_INVALID_HEADER)
- 快速入门代码:https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/quickstarts/speech-to-text-from-file?tabs=linux&pivots=programming-language-python#sample-code
- SDK:https://pypi.org/project/azure-cognitiveservices-speech/
此文件只有 3 个输入:
- Azure 订阅密钥
- Azure 服务区域
- 文件名
我正在使用以下测试 MP3 文件:
这是完整的输出:
Traceback (most recent call last):
File "main.py", line 16, in <module>
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/azure/cognitiveservices/speech/speech.py", line 761, in __init__
self._impl = self._get_impl(impl.SpeechRecognizer, speech_config, audio_config)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/azure/cognitiveservices/speech/speech.py", line 547, in _get_impl
_impl = reco_type._from_config(speech_config._impl, audio_config._impl)
RuntimeError: Exception with an error code: 0xa (SPXERR_INVALID_HEADER)
[CALL STACK BEGIN]
3 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106ad88d2 CreateModuleObject + 1136482
4 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106ad7f4f CreateModuleObject + 1134047
5 libMicrosoft.CognitiveServices.Speech.core.dylib 0x00000001069d1803 CreateModuleObject + 59027
6 libMicrosoft.CognitiveServices.Speech.core.dylib 0x00000001069d1503 CreateModuleObject + 58259
7 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a11c64 CreateModuleObject + 322292
8 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a10be5 CreateModuleObject + 318069
9 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a0e5a2 CreateModuleObject + 308274
10 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a0e7c3 CreateModuleObject + 308819
11 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106960bc7 recognizer_create_speech_recognizer_from_config + 3863
12 libMicrosoft.CognitiveServices.Speech.core.dylib 0x000000010695fd74 recognizer_create_speech_recognizer_from_config + 196
13 _speech_py_impl.so 0x00000001067ff35b PyInit__speech_py_impl + 814939
14 _speech_py_impl.so 0x000000010679b530 PyInit__speech_py_impl + 405808
15 Python 0x00000001060f65dc _PyMethodDef_RawFastCallKeywords + 668
16 Python 0x00000001060f5a5a _PyCFunction_FastCallKeywords + 42
17 Python 0x00000001061b45a4 call_function + 724
18 Python 0x00000001061b1576 _PyEval_EvalFrameDefault + 25190
19 Python 0x00000001060f5e90 function_code_fastcall + 128
20 Python 0x00000001061b45b2 call_function + 738
21 Python 0x00000001061b1576 _PyEval_EvalFrameDefault + 25190
22 Python 0x00000001061b50d6 _PyEval_EvalCodeWithName + 2422
23 Python 0x00000001060f55fb _PyFunction_FastCallDict + 523
24 Python 0x00000001060f68cf _PyObject_Call_Prepend + 143
25 Python 0x0000000106144d51 slot_tp_init + 145
26 Python 0x00000001061406a9 type_call + 297
27 Python 0x00000001060f5871 _PyObject_FastCallKeywords + 433
28 Python 0x00000001061b4474 call_function + 420
29 Python 0x00000001061b16bd _PyEval_EvalFrameDefault + 25517
30 Python 0x00000001061b50d6 _PyEval_EvalCodeWithName + 2422
31 Python 0x00000001061ab234 PyEval_EvalCode + 100
32 Python 0x00000001061e88f1 PyRun_FileExFlags + 209
33 Python 0x00000001061e816a PyRun_SimpleFileExFlags + 890
34 Python 0x00000001062079db pymain_main + 6875
35 Python 0x0000000106207f2a _Py_UnixMain + 58
36 libdyld.dylib 0x00007fff5d8aaed9 start + 1
37 ??? 0x0000000000000002 0x0 + 2
任何人都可以提供一些关于 header 这是指什么以及如何解决这个问题的建议。
不支持将 mp3 编码的音频作为输入格式。请使用具有 16 位样本、16 kHz 采样率和单声道 (Mono) 的 WAV(PCM) 文件。
默认的音频流格式是 WAV(16kHz 或 8kHz、16 位和单声道 PCM)。除了 WAV / PCM 之外,还支持下面列出的压缩输入格式。
但是如果你使用C#/Java/C++/Objective C并且你想使用.mp3等压缩音频格式,你可以处理它通过使用 GStreamer
有关详细信息,请参阅此 Microsoft 文档。
我想没有官方方法可以使用不同格式(mp3 或不同帧率)的 SDK 我想使用能够使用任何类型的音频文件输入的 Azure 方法
到现在为止我都是用我自己编的方法来处理这个问题,先转换适当的文件,完成我的工作后再删除它。原始文件保留:
对于python:
fname_buf = fname
fname = self.AudioFileAdjust(fname,'test-it')
# Do somethings
if fname_buf != fname:
self.AudioFileAdjust(fname,'remove')
子函数AudioFileAdjust(我正在使用pydub和pyaudio):
def AudioFileAdjust(self,fname,states=''):
'''
check audio file format and if not appropriate create new buffer audio for use
'''
if states == 'remove':
os.remove(fname)
else:
# if the file format not useful for Azure, first need to change -> fr: 16000 must be
audio_file = au.ReadAudioFile(fname)
if audio_file.frame_rate != int(16000):
#print('[Commend] changing the FrameRate')
audio_file_e = au.SetFramerate(audio_file,int(16000))
#change fine name for use
fname2 = fname.split(".")[0] + "_Conv_2" + ".wav" #without wav firstly and add additional
au.ExportAudioFile(audio_file_e,fname2)
#print('new file name: ', fname)
fname = fname2
return fname
我在使用 Microsoft Python Speech-to-Text Quickstart ("Quickstart: Recognize speech from an audio file") with the azure-cognitiveservices-speech v1.8.0 SDK 时遇到以下错误。
RuntimeError: Exception with an error code: 0xa (SPXERR_INVALID_HEADER)
- 快速入门代码:https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/quickstarts/speech-to-text-from-file?tabs=linux&pivots=programming-language-python#sample-code
- SDK:https://pypi.org/project/azure-cognitiveservices-speech/
此文件只有 3 个输入:
- Azure 订阅密钥
- Azure 服务区域
- 文件名
我正在使用以下测试 MP3 文件:
这是完整的输出:
Traceback (most recent call last):
File "main.py", line 16, in <module>
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/azure/cognitiveservices/speech/speech.py", line 761, in __init__
self._impl = self._get_impl(impl.SpeechRecognizer, speech_config, audio_config)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/azure/cognitiveservices/speech/speech.py", line 547, in _get_impl
_impl = reco_type._from_config(speech_config._impl, audio_config._impl)
RuntimeError: Exception with an error code: 0xa (SPXERR_INVALID_HEADER)
[CALL STACK BEGIN]
3 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106ad88d2 CreateModuleObject + 1136482
4 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106ad7f4f CreateModuleObject + 1134047
5 libMicrosoft.CognitiveServices.Speech.core.dylib 0x00000001069d1803 CreateModuleObject + 59027
6 libMicrosoft.CognitiveServices.Speech.core.dylib 0x00000001069d1503 CreateModuleObject + 58259
7 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a11c64 CreateModuleObject + 322292
8 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a10be5 CreateModuleObject + 318069
9 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a0e5a2 CreateModuleObject + 308274
10 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a0e7c3 CreateModuleObject + 308819
11 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106960bc7 recognizer_create_speech_recognizer_from_config + 3863
12 libMicrosoft.CognitiveServices.Speech.core.dylib 0x000000010695fd74 recognizer_create_speech_recognizer_from_config + 196
13 _speech_py_impl.so 0x00000001067ff35b PyInit__speech_py_impl + 814939
14 _speech_py_impl.so 0x000000010679b530 PyInit__speech_py_impl + 405808
15 Python 0x00000001060f65dc _PyMethodDef_RawFastCallKeywords + 668
16 Python 0x00000001060f5a5a _PyCFunction_FastCallKeywords + 42
17 Python 0x00000001061b45a4 call_function + 724
18 Python 0x00000001061b1576 _PyEval_EvalFrameDefault + 25190
19 Python 0x00000001060f5e90 function_code_fastcall + 128
20 Python 0x00000001061b45b2 call_function + 738
21 Python 0x00000001061b1576 _PyEval_EvalFrameDefault + 25190
22 Python 0x00000001061b50d6 _PyEval_EvalCodeWithName + 2422
23 Python 0x00000001060f55fb _PyFunction_FastCallDict + 523
24 Python 0x00000001060f68cf _PyObject_Call_Prepend + 143
25 Python 0x0000000106144d51 slot_tp_init + 145
26 Python 0x00000001061406a9 type_call + 297
27 Python 0x00000001060f5871 _PyObject_FastCallKeywords + 433
28 Python 0x00000001061b4474 call_function + 420
29 Python 0x00000001061b16bd _PyEval_EvalFrameDefault + 25517
30 Python 0x00000001061b50d6 _PyEval_EvalCodeWithName + 2422
31 Python 0x00000001061ab234 PyEval_EvalCode + 100
32 Python 0x00000001061e88f1 PyRun_FileExFlags + 209
33 Python 0x00000001061e816a PyRun_SimpleFileExFlags + 890
34 Python 0x00000001062079db pymain_main + 6875
35 Python 0x0000000106207f2a _Py_UnixMain + 58
36 libdyld.dylib 0x00007fff5d8aaed9 start + 1
37 ??? 0x0000000000000002 0x0 + 2
任何人都可以提供一些关于 header 这是指什么以及如何解决这个问题的建议。
不支持将 mp3 编码的音频作为输入格式。请使用具有 16 位样本、16 kHz 采样率和单声道 (Mono) 的 WAV(PCM) 文件。
默认的音频流格式是 WAV(16kHz 或 8kHz、16 位和单声道 PCM)。除了 WAV / PCM 之外,还支持下面列出的压缩输入格式。
但是如果你使用C#/Java/C++/Objective C并且你想使用.mp3等压缩音频格式,你可以处理它通过使用 GStreamer
有关详细信息,请参阅此 Microsoft 文档。
我想没有官方方法可以使用不同格式(mp3 或不同帧率)的 SDK 我想使用能够使用任何类型的音频文件输入的 Azure 方法
到现在为止我都是用我自己编的方法来处理这个问题,先转换适当的文件,完成我的工作后再删除它。原始文件保留:
对于python:
fname_buf = fname
fname = self.AudioFileAdjust(fname,'test-it')
# Do somethings
if fname_buf != fname:
self.AudioFileAdjust(fname,'remove')
子函数AudioFileAdjust(我正在使用pydub和pyaudio):
def AudioFileAdjust(self,fname,states=''):
'''
check audio file format and if not appropriate create new buffer audio for use
'''
if states == 'remove':
os.remove(fname)
else:
# if the file format not useful for Azure, first need to change -> fr: 16000 must be
audio_file = au.ReadAudioFile(fname)
if audio_file.frame_rate != int(16000):
#print('[Commend] changing the FrameRate')
audio_file_e = au.SetFramerate(audio_file,int(16000))
#change fine name for use
fname2 = fname.split(".")[0] + "_Conv_2" + ".wav" #without wav firstly and add additional
au.ExportAudioFile(audio_file_e,fname2)
#print('new file name: ', fname)
fname = fname2
return fname