将音频文件分成几部分,但我需要在语音识别中使用这些文件
split audio file into parts but i need to use these files in speech recognition
我在 google 语音识别中遇到关于长音频文件的问题..所以我决定在 15 秒内拆分我的音频文件..每次我发送第一个 15 秒到语音识别然后第二个 15秒等等...
但现在当我使用 pydub lib 时它拆分音频文件,拆分后的 return 值不是文件扩展名,因为 API 需要文件扩展名作为参数(我标记了错误)
它说“给定的音频文件必须是文件名字符串或类似文件的对象”
import speech_recognition as sr
import numpy
from os import path
AUDIO_FILE = "OAF_back_happy.wav"
from pydub import AudioSegment
sound = AudioSegment.from_wav("OAF_back_happy.wav")
halfway_point = len(sound) // 2
split = []
split.append(sound[:halfway_point])
split.append(sound[halfway_point:])
r = sr.Recognizer()
words=1
for x in split:
with sr.AudioFile(x) as source: #<-----
audio = r.record(source) # read the entire audio file
try:
# for testing purposes, we're just using the default API key
# to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
# instead of `r.recognize_google(audio)`
ans = r.recognize_google(audio)
print("Google Speech Recognition thinks you said " +ans)
for x in ans:
if (x.isspace()) == True:
words+=1
print(words)
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
已编辑:如评论中所述,我不想导出文件,因为我正在使用服务器,我不想将同一个文件“放置两次”
未经测试,因为我懒得安装我不使用的软件包,但这就是我的意思。
for x in split:
b = io.BytesIO()
x.export(b)
b.seek(0)
with sr.AudioFile(b) as source:
audio = r.record(source)
我在 google 语音识别中遇到关于长音频文件的问题..所以我决定在 15 秒内拆分我的音频文件..每次我发送第一个 15 秒到语音识别然后第二个 15秒等等...
但现在当我使用 pydub lib 时它拆分音频文件,拆分后的 return 值不是文件扩展名,因为 API 需要文件扩展名作为参数(我标记了错误) 它说“给定的音频文件必须是文件名字符串或类似文件的对象”
import speech_recognition as sr
import numpy
from os import path
AUDIO_FILE = "OAF_back_happy.wav"
from pydub import AudioSegment
sound = AudioSegment.from_wav("OAF_back_happy.wav")
halfway_point = len(sound) // 2
split = []
split.append(sound[:halfway_point])
split.append(sound[halfway_point:])
r = sr.Recognizer()
words=1
for x in split:
with sr.AudioFile(x) as source: #<-----
audio = r.record(source) # read the entire audio file
try:
# for testing purposes, we're just using the default API key
# to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
# instead of `r.recognize_google(audio)`
ans = r.recognize_google(audio)
print("Google Speech Recognition thinks you said " +ans)
for x in ans:
if (x.isspace()) == True:
words+=1
print(words)
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
已编辑:如评论中所述,我不想导出文件,因为我正在使用服务器,我不想将同一个文件“放置两次”
未经测试,因为我懒得安装我不使用的软件包,但这就是我的意思。
for x in split:
b = io.BytesIO()
x.export(b)
b.seek(0)
with sr.AudioFile(b) as source:
audio = r.record(source)