MP4 碎片 - 在浏览器中播放时出现问题

Fragmented MP4 - problem playing in browser

我尝试从原始 H264 视频数据创建片段 MP4,以便我可以在互联网浏览器的播放器中播放它。我的目标是创建实时流媒体系统,媒体服务器会将碎片化的 MP4 片段发送到浏览器。服务器将缓冲来自 RaspberryPi 摄像头的输入数据,该摄像头将视频作为 H264 帧发送。然后它会复用该视频数据并使其可供客户端使用。浏览器将使用媒体源扩展播放媒体数据(由服务器混合并通过 websocket 发送)。

出于测试目的,我编写了以下代码(使用了我在互联网上找到的许多示例):

使用 avcodec 的 C++ 应用程序将原始 H264 视频多路复用为片段 MP4 并将其保存到文件中:

#define READBUFSIZE 4096
#define IOBUFSIZE 4096
#define ERRMSGSIZE 128

#include <cstdint>
#include <iostream>
#include <fstream>
#include <string>
#include <vector>

extern "C"
{
    #include <libavformat/avformat.h>
    #include <libavutil/error.h>
    #include <libavutil/opt.h>
}

enum NalType : uint8_t
{
    //NALs containing stream metadata
    SEQ_PARAM_SET = 0x7,
    PIC_PARAM_SET = 0x8
};

std::vector<uint8_t> outputData;

int mediaMuxCallback(void *opaque, uint8_t *buf, int bufSize)
{
    outputData.insert(outputData.end(), buf, buf + bufSize);
    return bufSize;
}

std::string getAvErrorString(int errNr)
{
    char errMsg[ERRMSGSIZE];
    av_strerror(errNr, errMsg, ERRMSGSIZE);
    return std::string(errMsg);
}

int main(int argc, char **argv)
{
    if(argc < 2)
    {
        std::cout << "Missing file name" << std::endl;
        return 1;
    }

    std::fstream file(argv[1], std::ios::in | std::ios::binary);
    if(!file.is_open())
    {
        std::cout << "Couldn't open file " << argv[1] << std::endl;
        return 2;
    }

    std::vector<uint8_t> inputMediaData;
    do
    {
        char buf[READBUFSIZE];
        file.read(buf, READBUFSIZE);

        int size = file.gcount();
        if(size > 0)
            inputMediaData.insert(inputMediaData.end(), buf, buf + size);
    } while(!file.eof());
    file.close();

    //Initialize avcodec
    av_register_all();
    uint8_t *ioBuffer;
    AVCodec *codec = avcodec_find_decoder(AV_CODEC_ID_H264);
    AVCodecContext *codecCtxt = avcodec_alloc_context3(codec);
    AVCodecParserContext *parserCtxt = av_parser_init(AV_CODEC_ID_H264);
    AVOutputFormat *outputFormat = av_guess_format("mp4", nullptr, nullptr);
    AVFormatContext *formatCtxt;
    AVIOContext *ioCtxt;
    AVStream *videoStream;

    int res = avformat_alloc_output_context2(&formatCtxt, outputFormat, nullptr, nullptr);
    if(res < 0)
    {
        std::cout << "Couldn't initialize format context; the error was: " << getAvErrorString(res) << std::endl;
        return 3;
    }

    if((videoStream = avformat_new_stream( formatCtxt, avcodec_find_encoder(formatCtxt->oformat->video_codec) )) == nullptr)
    {
        std::cout << "Couldn't initialize video stream" << std::endl;
        return 4;
    }
    else if(!codec)
    {
        std::cout << "Couldn't initialize codec" << std::endl;
        return 5;
    }
    else if(codecCtxt == nullptr)
    {
        std::cout << "Couldn't initialize codec context" << std::endl;
        return 6;
    }
    else if(parserCtxt == nullptr)
    {
        std::cout << "Couldn't initialize parser context" << std::endl;
        return 7;
    }
    else if((ioBuffer = (uint8_t*)av_malloc(IOBUFSIZE)) == nullptr)
    {
        std::cout << "Couldn't allocate I/O buffer" << std::endl;
        return 8;
    }
    else if((ioCtxt = avio_alloc_context(ioBuffer, IOBUFSIZE, 1, nullptr, nullptr, mediaMuxCallback, nullptr)) == nullptr)
    {
        std::cout << "Couldn't initialize I/O context" << std::endl;
        return 9;
    }

    //Set video stream data
    videoStream->id = formatCtxt->nb_streams - 1;
    videoStream->codec->width = 1280;
    videoStream->codec->height = 720;
    videoStream->time_base.den = 60; //FPS
    videoStream->time_base.num = 1;
    videoStream->codec->flags |= CODEC_FLAG_GLOBAL_HEADER;
    formatCtxt->pb = ioCtxt;

    //Retrieve SPS and PPS for codec extdata
    const uint32_t synchMarker = 0x01000000;
    unsigned int i = 0;
    int spsStart = -1, ppsStart = -1;
    uint16_t spsSize = 0, ppsSize = 0;
    while(spsSize == 0 || ppsSize == 0)
    {
        uint32_t *curr =  (uint32_t*)(inputMediaData.data() + i);
        if(*curr == synchMarker)
        {
            unsigned int currentNalStart = i;
            i += sizeof(uint32_t);
            uint8_t nalType = inputMediaData.data()[i] & 0x1F;
            if(nalType == SEQ_PARAM_SET)
                spsStart = currentNalStart;
            else if(nalType == PIC_PARAM_SET)
                ppsStart = currentNalStart;

            if(spsStart >= 0 && spsSize == 0 && spsStart != i)
                spsSize = currentNalStart - spsStart;
            else if(ppsStart >= 0 && ppsSize == 0 && ppsStart != i)
                ppsSize = currentNalStart - ppsStart;
        }
        ++i;
    }

    videoStream->codec->extradata = inputMediaData.data() + spsStart;
    videoStream->codec->extradata_size = ppsStart + ppsSize;

    //Write main header
    AVDictionary *options = nullptr;
    av_dict_set(&options, "movflags", "frag_custom+empty_moov", 0);
    res = avformat_write_header(formatCtxt, &options);
    if(res < 0)
    {
        std::cout << "Couldn't write container main header; the error was: " << getAvErrorString(res) << std::endl;
        return 10;
    }

    //Retrieve frames from input video and wrap them in container
    int currentInputIndex = 0;
    int framesInSecond = 0;
    while(currentInputIndex < inputMediaData.size())
    {
        uint8_t *frameBuffer;
        int frameSize;
        res = av_parser_parse2(parserCtxt, codecCtxt, &frameBuffer, &frameSize, inputMediaData.data() + currentInputIndex,
            inputMediaData.size() - currentInputIndex, AV_NOPTS_VALUE, AV_NOPTS_VALUE, 0);
        if(frameSize == 0) //No more frames while some data still remains (is that even possible?)
        {
            std::cout << "Some data left unparsed: " << std::to_string(inputMediaData.size() - currentInputIndex) << std::endl;
            break;
        }

        //Prepare packet with video frame to be dumped into container
        AVPacket packet;
        av_init_packet(&packet);
        packet.data = frameBuffer;
        packet.size = frameSize;
        packet.stream_index = videoStream->index;
        currentInputIndex += frameSize;

        //Write packet to the video stream
        res = av_write_frame(formatCtxt, &packet);
        if(res < 0)
        {
            std::cout << "Couldn't write packet with video frame; the error was: " << getAvErrorString(res) << std::endl;
            return 11;
        }

        if(++framesInSecond == 60) //We want 1 segment per second
        {
            framesInSecond = 0;
            res = av_write_frame(formatCtxt, nullptr); //Flush segment
        }
    }
    res = av_write_frame(formatCtxt, nullptr); //Flush if something has been left

    //Write media data in container to file
    file.open("my_mp4.mp4", std::ios::out | std::ios::binary);
    if(!file.is_open())
    {
        std::cout << "Couldn't open output file " << std::endl;
        return 12;
    }

    file.write((char*)outputData.data(), outputData.size());
    if(file.fail())
    {
        std::cout << "Couldn't write to file" << std::endl;
        return 13;
    }

    std::cout << "Media file muxed successfully" << std::endl;
    return 0;
}

(我硬编码了一些值,例如视频尺寸或帧率,但正如我所说,这只是一个测试代码。)


简单HTML网页使用MSE播放我的MP4碎片

<!DOCTYPE html>
<html>
<head>
    <title>Test strumienia</title>
</head>
<body>
    <video width="1280" height="720" controls>
    </video>
</body>
<script>
var vidElement = document.querySelector('video');

if (window.MediaSource) {
  var mediaSource = new MediaSource();
  vidElement.src = URL.createObjectURL(mediaSource);
  mediaSource.addEventListener('sourceopen', sourceOpen);
} else {
  console.log("The Media Source Extensions API is not supported.")
}

function sourceOpen(e) {
  URL.revokeObjectURL(vidElement.src);
  var mime = 'video/mp4; codecs="avc1.640028"';
  var mediaSource = e.target;
  var sourceBuffer = mediaSource.addSourceBuffer(mime);
  var videoUrl = 'my_mp4.mp4';
  fetch(videoUrl)
    .then(function(response) {
      return response.arrayBuffer();
    })
    .then(function(arrayBuffer) {
      sourceBuffer.addEventListener('updateend', function(e) {
        if (!sourceBuffer.updating && mediaSource.readyState === 'open') {
          mediaSource.endOfStream();
        }
      });
      sourceBuffer.appendBuffer(arrayBuffer);
    });
}
</script>
</html>

我的 C++ 应用程序生成的输出 MP4 文件可以在 MPC 中播放,但它不能在我测试过的任何网络浏览器中播放。它也没有任何持续时间(MPC 一直显示 00:00)。

为了比较我从上述 C++ 应用程序获得的输出 MP4 文件,我还使用 FFMPEG 从具有原始 H264 流的同一源文件创建了碎片化的 MP4 文件。我使用了以下命令:

ffmpeg -r 60 -i input.h264 -c:v copy -f mp4 -movflags empty_moov+default_base_moof+frag_keyframe test.mp4

这个由 FFMPEG 生成的文件在我用于测试的每个网络浏览器上都能正确播放。它也有正确的持续时间(但它也有尾随原子,无论如何它不会出现在我的直播中,并且因为我需要直播,所以它首先不会有任何固定的持续时间)。

两个文件的 MP4 原子看起来非常相似(它们肯定有相同的 avcc 部分)。有趣的是(但不确定它是否重要),这两个文件的 NAL 格式都与输入文件不同(RPI 相机以 Annex-B 格式生成视频流,而输出 MP4 文件包含 AVCC 格式的 NAL ......或者至少它看起来我将 mdat 原子与输入 H264 数据进行比较时就是这种情况。

我假设我需要为 avcodec 设置一些字段(或几个字段),以使其生成可由浏览器播放器正确解码和播放的视频流。但是我需要设置什么字段?或者问题出在其他地方?我运行没主意了。


编辑 1: 按照建议,我使用十六进制编辑器研究了两个 MP4 文件(由我的应用程序和 FFMPEG 工具生成)的二进制内容。我可以确认的是:

所以我想我的代码中的额外数据创建没有任何问题 - avcodec 会妥善处理它,即使我只是用 SPS 和 PPS NAL 提供它。它自己转换它们,所以我不需要手工完成。尽管如此,我原来的问题仍然存在。

编辑 2: 我取得了部分成功 - 我的应用程序生成的 MP4 现在可以在 Firefox 中播放。我将这一行添加到代码中(连同其余的流初始化):

videoStream->codec->time_base = videoStream->time_base;

现在我的这部分代码如下所示:

//Set video stream data
videoStream->id = formatCtxt->nb_streams - 1;
videoStream->codec->width = 1280;
videoStream->codec->height = 720;
videoStream->time_base.den = 60; //FPS
videoStream->time_base.num = 1;
videoStream->codec->time_base = videoStream->time_base;
videoStream->codec->flags |= CODEC_FLAG_GLOBAL_HEADER;
formatCtxt->pb = ioCtxt;

MP4 atoms for both files look very similiar (they have identical avcc section for sure)

仔细检查,提供的代码向我提出了不同的建议。

What's interesting (but not sure if it's of any importance), both files have different NALs format than input file (RPI camera produces video stream in Annex-B format, while output MP4 files contain NALs in AVCC format... or at least it looks like it's the case when I compare mdat atoms with input H264 data).

非常重要,mp4 不能与附件 b 一起使用。

你需要用AVC Decoder Configuration Record填写extradata,不只是SPS/PPS

记录应如下所示: AVCDCR

我终于找到了解决办法。我的 MP4 现在可以在 Chrome 中播放(同时仍在其他经过测试的浏览器中播放)。

在 Chrome chrome://media-internals/ 中显示 MSE 日志(某种)。当我查看那里时,我发现了一些针对我的测试播放器的以下警告:

ISO-BMFF container metadata for video frame indicates that the frame is not a keyframe, but the video frame contents indicate the opposite.

这让我想到并鼓励为带有关键帧的数据包设置 AV_PKT_FLAG_KEY。我将以下代码添加到填充 AVPacket 结构的部分:

    //Check if keyframe field needs to be set
    int allowedNalsCount = 3; //In one packet there would be at most three NALs: SPS, PPS and video frame
    packet.flags = 0;
    for(int i = 0; i < frameSize && allowedNalsCount > 0; ++i)
    {
        uint32_t *curr =  (uint32_t*)(frameBuffer + i);
        if(*curr == synchMarker)
        {
            uint8_t nalType = frameBuffer[i + sizeof(uint32_t)] & 0x1F;
            if(nalType == KEYFRAME)
            {
                std::cout << "Keyframe detected at frame nr " << framesTotal << std::endl;
                packet.flags = AV_PKT_FLAG_KEY;
                break;
            }
            else
                i += sizeof(uint32_t) + 1; //We parsed this already, no point in doing it again

            --allowedNalsCount;
        }
    }

A KEYFRAME 常量在我的例子中是 0x5(Slice IDR)。

我们可以在[Chrome来源]中找到这个解释 (https://chromium.googlesource.com/chromium/src/+/refs/heads/master/media/formats/mp4/mp4_stream_parser.cc#799)"chrome media source code":

// Copyright 2014 The Chromium Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.


  // Use |analysis.is_keyframe|, if it was actually determined, for logging
  // if the analysis mismatches the container's keyframe metadata for
  // |frame_buf|.
  if (analysis.is_keyframe.has_value() &&
      is_keyframe != analysis.is_keyframe.value()) {
    LIMITED_MEDIA_LOG(DEBUG, media_log_, num_video_keyframe_mismatches_,
                      kMaxVideoKeyframeMismatchLogs)
        << "ISO-BMFF container metadata for video frame indicates that the "
           "frame is "
        << (is_keyframe ? "" : "not ")
        << "a keyframe, but the video frame contents indicate the "
           "opposite.";
    // As of September 2018, it appears that all of Edge, Firefox, Safari
    // work with content that marks non-avc-keyframes as a keyframe in the
    // container. Encoders/muxers/old streams still exist that produce
    // all-keyframe mp4 video tracks, though many of the coded frames are
    // not keyframes (likely workaround due to the impact on low-latency
    // live streams until https://crbug.com/229412 was fixed).  We'll trust
    // the AVC frame's keyframe-ness over the mp4 container's metadata if
    // they mismatch. If other out-of-order codecs in mp4 (e.g. HEVC, DV)
    // implement keyframe analysis in their frame_bitstream_converter, we'll
    // similarly trust that analysis instead of the mp4.
    is_keyframe = analysis.is_keyframe.value();
  }

如代码注释所示,chrome 信任 AVC 帧的 keyframe-ness 而不是 mp4 容器的元数据。所以H264/HEVC中的nalu type应该比mp4 container box sdtp & trun description更重要。