如何正确地将 FLV 视频解包为 gstreamer 缓冲区的原始有效 h264 段?

How do I properly unwrap FLV video into raw and valid h264 segments for gstreamer buffers?

我用 Rust 编写了一个 RTMP 服务器,它成功地允许 RTMP 发布者连接、推送视频流,并且 RTMP 客户端可以成功连接并观看这些视频流。

当收到视频 RTMP 数据包时,我尝试通过以下方式从 FLV 容器中解包视频:

    // TODO: The FLV spec has the AVCPacketType and composition time as the first parts of the
    // AVCPACKETTYPE.  It's unclear if these two fields are part of h264 or FLV specific. 
    let flv_tag = data.split_to(1);
    let is_sequence_header;
    let codec = if flv_tag[0] & 0x07 == 0x07 {
        is_sequence_header = data[0] == 0x00;
        VideoCodec::H264
    } else {
        is_sequence_header = false;
        VideoCodec::Unknown
    };

    let is_keyframe = flv_tag[0] & 0x10 == 0x10;

在此运行后 data 包含删除了 flv 标签的 AVCVIDEOPACKET。当我将此视频发送到其他 RTMP 客户端时,我只是在其前面加上正确的 flv 标签并将其发送出去。

现在我正在尝试将视频数据包传递给 gstreamer,以便进行进程内转码。为此,我设置了一个 appsrc | avdec_264 管道,并为 appsrc 组件提供了以下上限:

        video_source.set_caps(Some(
            &Caps::builder("video/x-h264")
                .field("alignment", "nal")
                .field("stream-format", "byte-stream")
                .build()
        ));

现在,当 RTMP 发布者发送视频数据包时,我将(尝试的)解包视频数据包并通过

传递到我的 appsrc
    pub fn push_video(&self, data: Bytes, timestamp: RtmpTimestamp) {
        let mut buffer = Buffer::with_size(data.len()).unwrap();
        {
            let buffer = buffer.get_mut().unwrap();
            buffer.set_pts(ClockTime::MSECOND * timestamp.value as u64);

            let mut samples = buffer.map_writable().unwrap();
            {
                let samples = samples.as_mut_slice();
                for index in 0..data.len() {
                    samples[index] = data[index];
                }
            }
        }

        self.video_source.push_buffer(buffer).unwrap();
    }

发生这种情况时,会出现以下 gstreamer 调试输出

2022-02-09T18:25:15Z INFO  gstreamer_mmids_scratchpad] Pushing packet #0 (is_sequence_header:true, is_keyframe=true)
[2022-02-09T18:25:15Z INFO  gstreamer_mmids_scratchpad] Connection 63397d56-16fb-4b54-a622-d991b5ad2d8e sent audio data
0:00:05.531722000  7516 000001C0C04011C0 INFO               GST_EVENT gstevent.c:973:gst_event_new_segment: creating segment event bytes segment start=0, offset=0, stop=-1, rate=1.000000, applied_rate=1.000000, flags=0x00, time=0, base=0, position 0, duration -1
0:00:05.533525000  7516 000001C0C04011C0 INFO                 basesrc gstbasesrc.c:3018:gst_base_src_loop:<video_source> marking pending DISCONT
0:00:05.535385000  7516 000001C0C04011C0 WARN            videodecoder gstvideodecoder.c:2818:gst_video_decoder_chain:<video_decode> Received buffer without a new-segment. Assuming timestamps start from 0.
0:00:05.537381000  7516 000001C0C04011C0 INFO               GST_EVENT gstevent.c:973:gst_event_new_segment: creating segment event time segment start=0:00:00.000000000, offset=0:00:00.000000000, stop=99:99:99.999999999, rate=1.000000, applied_rate=1.000000, flags=0x00, time=0:00:00.000000000, base=0:00:00.000000000, position 0:00:00.000000000, duration 99:99:99.999999999
[2022-02-09T18:25:15Z INFO  gstreamer_mmids_scratchpad] Pushing packet #1 (is_sequence_header:false, is_keyframe=true)
0:00:05.563445000  7516 000001C0C04011C0 INFO                   libav :0:: Invalid NAL unit 0, skipping.
[2022-02-09T18:25:15Z INFO  gstreamer_mmids_scratchpad] Pushing packet #2 (is_sequence_header:false, is_keyframe=false)
0:00:05.579274000  7516 000001C0C04011C0 ERROR                  libav :0:: No start code is found.
0:00:05.581338000  7516 000001C0C04011C0 ERROR                  libav :0:: Error splitting the input into NAL units.
0:00:05.583337000  7516 000001C0C04011C0 WARN                   libav gstavviddec.c:2068:gst_ffmpegviddec_handle_frame:<video_decode> Failed to send data for decoding
[2022-02-09T18:25:15Z INFO  gstreamer_mmids_scratchpad] Pushing packet #3 (is_sequence_header:false, is_keyframe=false)
0:00:05.595253000  7516 000001C0C04011C0 ERROR                  libav :0:: No start code is found.
0:00:05.597204000  7516 000001C0C04011C0 ERROR                  libav :0:: Error splitting the input into NAL units.
0:00:05.599262000  7516 000001C0C04011C0 WARN                   libav gstavviddec.c:2068:gst_ffmpegviddec_handle_frame:<video_decode> Failed to send data for decoding

基于此,我认为这可能是由于 AVCVIDEOPACKET 的非数据部分不是 h264 流的一部分,而是 FLV 特定流造成的。所以我尝试忽略我写入缓冲区的每个数据包的前 4 个字节(AVCPacketTypeCompositionTime 字段):

    pub fn push_video(&self, data: Bytes, timestamp: RtmpTimestamp) {
        let mut buffer = Buffer::with_size(data.len() - 4).unwrap();
        {
            let buffer = buffer.get_mut().unwrap();
            buffer.set_pts(ClockTime::MSECOND * timestamp.value as u64);

            let mut samples = buffer.map_writable().unwrap();
            {
                let samples = samples.as_mut_slice();
                for index in 4..data.len() {
                    samples[index - 4] = data[index];
                }
            }
        }

        self.video_source.push_buffer(buffer).unwrap();
    }

这基本上给了我相同的日志输出和错误。这也可以通过 h264parse 插件重现。

在将原始 h264 视频传递给 gstreamer 的展开过程中我缺少什么?

编辑:

意识到我误读了 pad 模板,我尝试了以下大写字母

        video_source.set_caps(Some(
            &Caps::builder("video/x-h264")
                .field("alignment", "au")
                .field("stream-format", "avc")
                .build()
        ));

这也失败了,输出非常相似。

我想我终于明白了。

第一件事是我需要包括删除 AVCVIDEOPACKET headers(数据包类型和组合时间字段)。这些不是 h264 格式的一部分,因此会导致解析错误。

我需要做的第二件事是不要将序列 header 作为缓冲区传递给源。相反,序列 header 字节需要设置为 appsrc 上限的 codec_data 字段。这现在允许在将视频数据传递给 h264parse 时没有解析错误,甚至给我一个正确大小的 window.

我遗漏的第三件事是正确的 dtspts 值。结果我给出的 RTMP 时间戳是 dtspts = AVCVIDEOPACKET.CompositionTime + dts.