将格式为 8Khz `mulaw` 的 Twilio <Stream> 保存到文件

Save a Twilio <Stream> of format 8Khz `mulaw` to a file

有几个帖子解决了这个问题,但我无法成功播放保存的文件。它通常以半速播放。

使用上述问题中接受的答案,我在 go 中保存了原始和 base64 解码的音频:

// Media event
type Media struct {
    Track     string `json:"track"`
    Chunk     string `json:"chunk"`
    Timestamp string `json:"timestamp"`
    Payload   string `json:"payload"`
}

// SaveAudio will upgrade connection to websocket and save the audio to file
func SaveAudio(w http.ResponseWriter, r *http.Request) {
    utility.DebugLogf("SaveAudio")
    c, err := upgrader.Upgrade(w, r, nil)
    if err != nil {
        log.Print("upgrade:", err)
        return
    }

    defer utility.SafeClose(c)
    inBuf := bytes.Buffer{}

    loop := true
    for loop == true {
        _, message, err := c.ReadMessage()
        utility.PanicIfErr(err)
        decMessage := TwilioWSSMessage{}
        err = json.Unmarshal(message, &decMessage)
        utility.PanicIfErr(err)

        switch decMessage.Event {
        case "connected":
            utility.DebugLogf("Connected a %s protocol version:%s", decMessage.Protocol, decMessage.Version)
        case "start":
            utility.DebugLogf("Starting audio stream: %#v", decMessage.Start)
        case "media":
            chunk, err := base64.StdEncoding.DecodeString(decMessage.Media.Payload)
            utility.PanicIfErr(err)
            _, err = inBuf.Write(chunk)
            utility.PanicIfErr(err)
        case "stop":
            utility.DebugLogf("Ending audio stream: %#v", decMessage.Stop)
            loop = false
        default:
            utility.LogWarningf("Unrecognized event type: %s", decMessage.Event)
            loop = false
        }
    }

    saveRaw(&inBuf)
}

func saveRaw(buf *bytes.Buffer) {
    rawOut, err := os.Create("out.ulaw")
    utility.PanicIfErr(err)

    _, err = rawOut.Write(buf.Bytes())
    utility.PanicIfErr(err)
}

然后我用ffmpegmulaw转换成默认的pcm_s16le:

ffmpeg -f mulaw -ar 8000 -ac 1 -i out.ulaw mulaw_decoded.wav 

然后对 8k->16k 的音频进行上采样并使用 vlc 播放:

ffmpeg -i mulaw_decoded.wav -ar 16000 upsampled.wav && vlc upsampled.wav

但它以半速播放。

最终我想用 rust 或 go 来完成这一切,但我什至不能只用 ffmpeg 让它在本地工作。

提前致谢。


以上两个 ffmpeg 操作的输出与建议的 sox 重采样器相结合:

命令:

ffmpeg -y -loglevel verbose -f mulaw -ar 8000 -ac 1 -bits_per_raw_sample 8 -i testsamples/raw_mulaw_bytes -af aresample=resampler=soxr -ar 16000 upsampled.wav

输出:

[mulaw @ 0x7fecc0814000] Estimating duration from bitrate, this may be inaccurate
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, mulaw, from 'testsamples/raw_mulaw_bytes':
  Duration: 00:00:20.74, bitrate: 64 kb/s
    Stream #0:0: Audio: pcm_mulaw, 8000 Hz, mono, s16, 64 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_mulaw (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
[graph_0_in_0_0 @ 0x7fecc0505600] tb:1/8000 samplefmt:s16 samplerate:8000 chlayout:0x4
[Parsed_aresample_0 @ 0x7fecc0505280] ch:1 chl:mono fmt:s16 r:8000Hz -> ch:1 chl:mono fmt:s16 r:16000Hz
Output #0, wav, to 'upsampled.wav':
  Metadata:
    ISFT            : Lavf58.29.100
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
    Metadata:
      encoder         : Lavc58.54.100 pcm_s16le
No more output streams to write to, finishing.
size=     648kB time=00:00:20.74 bitrate= 256.0kbits/s speed=1.55e+03x
video:0kB audio:648kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.011753%
Input file #0 (testsamples/raw_mulaw_bytes):
  Input stream #0:0 (audio): 519 packets read (165920 bytes); 519 frames decoded (165920 samples);
  Total: 519 packets (165920 bytes) demuxed
Output file #0 (upsampled.wav):
  Output stream #0:0 (audio): 200 frames encoded (331840 samples); 200 packets muxed (663680 bytes);
  Total: 200 packets (663680 bytes) muxed
[AVIOContext @ 0x7fecc0433cc0] Statistics: 4 seeks, 6 writeouts
[AVIOContext @ 0x7fecc042a6c0] Statistics: 165920 bytes read, 0 seeks

音频听起来和以前一样

感谢这个答案,我终于弄明白了:

他提到的地方:

Another possible reason of "slow motion" is more than one stream decoded by the same decoder. But in this case you also get distorted slow audio.

所以这次通话的轨道是 inboundoutbound 所以在 case "media": 仅保存一首曲目:

if decMessage.Media.Track == "outbound" {
  chunk, err := base64.StdEncoding.DecodeString(decMessage.Media.Payload)
  utility.PanicIfErr(err)
  _, err = outboundBuf.Write(chunk)
  utility.PanicIfErr(err)
}

并且 ffmpeg 命令按预期工作