Twilio 媒体流与 Speech to Text (Twilio Say) 同时使用

Twilio media stream used simultaneously with Speech to Text (Twilio Say)

我正在开发一个使用 twilio 媒体流 (Google STT) 的语音机器人,处理文本并使用 TwiML Say Object 将响应返回给用户。 我使用的端点在用户开始呼叫时触发(状态呼叫正在响铃):

@app.route("/twilio_call", methods=['POST'])
def voice(request):
    """Respond to incoming phone calls with a greet message"""
    call_status = request.form.get("CallStatus")

    if call_status == "ringing":
       voice_respond = VoiceResponse()
       voice_respond.say("hello! how can I help!", voice='women')

       return response.text(str(voice_response), content_type="text/xml")

这条消息传递给用户后,我想直接用媒体流触发websocket服务器。

@app.websockets('/media')
def transcribe_media(request, ws):
    while True:
        message = ws.recv()
        if message is None:
            continue

    data = json.loads(message)
    if data['event'] == "media":

                    ....
#here I'm sending data to google and getting the transcription back

我无法像文档中的此处那样修改正在进行的呼叫:https://www.twilio.com/docs/voice/tutorials/how-to-modify-calls-in-progress-python

我已经尝试过:

client = Client(ACCOUNT_SID, AUTH_TOKEN)
        call = client.calls(conversation_id).update(
            twiml='<Response><Say voice="woman" language="de-DE">' + msg_text + '</Say></Response>')

但是我收到一条错误消息,状态呼叫未在进行中(正在“响铃”)..

我也尝试过使用 TwiML“STREAM”对象,但是当它与 TwiML“Say”对象一起使用时它没有启动服务器(当我只传递 STREAM 时它触发服务器):

 voice_response = VoiceResponse()
 start = Start()
 start.stream(url='wss://<NGROK_IP>.ngrok.io/webhooks/twilio_voice/media')
 voice_response.say("Hello, how can I help?", language="en-GB")
 voice_response.append(start)
response.text(str(voice_response), content_type="text/xml")

有人知道我该如何解决这个问题吗? 在将 Twiml"Say" 对象传递给用户后,如何触发 websocket 服务器?

这里是 Twilio 开发人员布道者。

实现此目的的正确方法是通过Stream TwiML element. I would recommend placing the stream at the start of the TwiML response so that it can establish in time for you to start receiving the user's speech. Also, once the TwiML is complete, Twilio will hang up the call, even if there is a live stream. So you should pause等待用户的语音响应。

所以,我会将您的 webhook 端点更改为:

@app.route("/twilio_call", methods=['POST'])
def voice(request):
"""Respond to incoming phone calls with a greet message"""
call_status = request.form.get("CallStatus")

voice_respond = VoiceResponse()

start = Start()
start.stream(url='wss://<NGROK_IP>.ngrok.io/webhooks/twilio_voice/media')
voice_respond.append(start)

voice_respond.say("hello! how can I help!", voice='women')
voice_respond.pause(length=60)

return response.text(str(voice_response), content_type="text/xml")

现在您的流应该连接到您的 websocket 端点,您的用户将听到问候语。通话不会挂断,因为有 60 秒的暂停,当用户说话时,您可以使用您的 websocket 端点将语音发送到 STT 服务,当您得到响应时,重定向通话并使用 <Say>说出结果。