如何使用VideoToolbox解压H.264视频流

How to use VideoToolbox to decompress H.264 video stream

我在弄清楚如何使用 Apple 的硬件加速视频框架来解压缩 H.264 视频流时遇到了很多麻烦。几周后我想通了,想分享一个广泛的例子,因为我找不到。

我的目标是为 WWDC '14 session 513 中介绍的视频工具箱提供一个全面的、有指导意义的示例。我的代码将无法编译或 运行,因为它需要与基本 H.264 流(例如从文件中读取的视频或从在线流式传输的视频等)集成,并且需要根据具体情况进行调整。

我应该提一下,我对视频的经验很少 en/decoding 除了我在谷歌搜索这个主题时学到的东西。我不知道关于视频格式、参数结构等所有细节,所以我只包含了我认为你需要知道的内容。

我正在使用 XCode 6.2 并已部署到 iOS 设备 运行宁 iOS 8.1 和 8.2。

概念:

NALU: NALU 只是具有 NALU 起始代码 header 0x00 00 00 01 YY 的可变长度数据块,其中 header 0x00 00 00 01 YY 的前 5 位 YY 告诉您这是什么类型的 NALU,因此 header 之后是什么类型的数据。 (由于您只需要前 5 位,我使用 YY & 0x1F 来获取相关位。)我在方法 NSString * const naluTypesStrings[] 中列出了所有这些类型,但您不需要知道是什么他们都是。

参数:您的解码器需要参数,以便它知道 H.264 视频数据的存储方式。您需要设置的 2 个是 Sequence Parameter Set (SPS)Picture Parameter Set (PPS),它们每个都有自己的 NALU 类型编号。您不需要知道参数的含义,解码器知道如何处理它们。

H.264 流格式: 在大多数 H.264 流中,您将收到一组初始的 PPS 和 SPS 参数,后跟一个 i 帧(也称为 IDR 帧或齐平框架)NALU。然后你会收到几个P帧NALU(可能几十个左右),然后是另一组参数(可能和初始参数一样)和一个i帧,更多的P帧,等等。i帧比P帧。从概念上讲,您可以将 i 帧视为视频的整个图像,而 P 帧只是对该 i 帧所做的更改,直到您收到下一个 i 帧。

程序:

  1. 从您的 H.264 流生成单独的 NALU。 我无法显示此步骤的代码,因为它在很大程度上取决于您使用的视频源.我制作此图是为了显示我正在使用的内容(图中的 "data" 在我的以下代码中是 "frame"),但您的情况可能会有所不同。 每次我收到一个帧 (uint8_t *frame) 时都会调用我的方法 receivedRawVideoFrame:,它是两种类型之一。在图中,这2种帧类型是2个大紫色框。

  2. 使用 CMVideoFormatDescriptionCreateFromH264ParameterSets( ) 从您的 SPS 和 PPS NALU 创建一个 CMVideoFormatDescriptionRef。如果不先执行此操作,则无法显示任何帧。 SPS 和 PPS 可能看起来像一堆数字,但 VTD 知道如何处理它们。你只需要知道CMVideoFormatDescriptionRef是对视频数据的描述,比如width/height,格式类型(kCMPixelFormat_32BGRAkCMVideoCodecType_H264等),纵横比,颜色space 等。您的解码器将保留这些参数,直到有一组新参数到达(有时即使参数没有更改,也会定期重新发送参数)。

  3. Re-package 你的 IDR 和 non-IDR 根据 "AVCC" 格式构建 NALU。 这意味着删除 NALU开始代码并将它们替换为一个 4 字节 header 来说明 NALU 的长度。您不需要为 SPS 和 PPS NALU 执行此操作。 (请注意,4 字节 NALU 长度 header 在 big-endian 中,因此如果您有一个 UInt32 值,则在复制到 CMBlockBuffer 之前它必须是 byte-swapped使用 CFSwapInt32。我在我的代码中使用 htonl 函数调用来执行此操作。)

  4. 将 IDR 和 non-IDR NALU 帧打包到 CMBlockBuffer 中。 不要使用 SPS PPS 参数 NALUs 执行此操作。关于 CMBlockBuffers,您只需要知道它们是一种在核心媒体中包装任意数据块的方法。 (视频管道中的任何压缩视频数据都包含在其中。)

  5. 将CMBlockBuffer打包成CMSampleBuffer。关于CMSampleBuffers你只需要知道他们把我们的CMBlockBuffers和其他的包装起来信息(这里是 CMVideoFormatDescriptionCMTime,如果使用 CMTime)。

  6. 创建一个 VTDecompressionSessionRef 并将样本缓冲区提供给 VTDecompressionSessionDecodeFrame( )。 或者,您可以使用 AVSampleBufferDisplayLayer 及其 enqueueSampleBuffer:方法,您将不需要使用 VTDecompSession。它设置起来更简单,但如果出现问题也不会像 VTD 那样抛出错误。

  7. 在 VTDecompSession 回调中,使用生成的 CVImageBufferRef 显示视频帧。 如果您需要将 CVImageBuffer 转换为 UIImage,请参阅我的 Whosebug 答案 here

其他说明:

  • H.264 流可能有很大差异。据我了解,NALU 起始码 headers 有时是 3 个字节 (0x00 00 01) 有时是 4 (0x00 00 00 01).我的代码适用于 4 个字节;如果您使用 3.

  • ,则需要进行一些更改
  • 如果您想了解更多关于 NALU 的信息,我发现 this answer 非常有帮助。在我的例子中,我发现我不需要像描述的那样忽略 "emulation prevention" 字节,所以我个人跳过了那一步,但你可能需要知道这一点。

  • 如果您的 VTDecompressionSession 输出错误编号(如 -12909) 在您的 XCode 项目中查找错误代码。在项目导航器中找到 VideoToolbox 框架,打开它并找到 header VTErrors.h。如果找不到,我还在另一个答案中包含了以下所有错误代码。

代码示例:

因此,让我们从声明一些全局变量并包括 VT 框架(VT = 视频工具箱)开始。

#import <VideoToolbox/VideoToolbox.h>

@property (nonatomic, assign) CMVideoFormatDescriptionRef formatDesc;
@property (nonatomic, assign) VTDecompressionSessionRef decompressionSession;
@property (nonatomic, retain) AVSampleBufferDisplayLayer *videoLayer;
@property (nonatomic, assign) int spsSize;
@property (nonatomic, assign) int ppsSize;

以下数组仅用于打印出您正在接收的 NALU 帧的类型。如果您知道所有这些类型的含义,对您有好处,那么您对 ​​H.264 的了解比我多 :) 我的代码仅处理类型 1、5、7 和 8。

NSString * const naluTypesStrings[] =
{
    @"0: Unspecified (non-VCL)",
    @"1: Coded slice of a non-IDR picture (VCL)",    // P frame
    @"2: Coded slice data partition A (VCL)",
    @"3: Coded slice data partition B (VCL)",
    @"4: Coded slice data partition C (VCL)",
    @"5: Coded slice of an IDR picture (VCL)",      // I frame
    @"6: Supplemental enhancement information (SEI) (non-VCL)",
    @"7: Sequence parameter set (non-VCL)",         // SPS parameter
    @"8: Picture parameter set (non-VCL)",          // PPS parameter
    @"9: Access unit delimiter (non-VCL)",
    @"10: End of sequence (non-VCL)",
    @"11: End of stream (non-VCL)",
    @"12: Filler data (non-VCL)",
    @"13: Sequence parameter set extension (non-VCL)",
    @"14: Prefix NAL unit (non-VCL)",
    @"15: Subset sequence parameter set (non-VCL)",
    @"16: Reserved (non-VCL)",
    @"17: Reserved (non-VCL)",
    @"18: Reserved (non-VCL)",
    @"19: Coded slice of an auxiliary coded picture without partitioning (non-VCL)",
    @"20: Coded slice extension (non-VCL)",
    @"21: Coded slice extension for depth view components (non-VCL)",
    @"22: Reserved (non-VCL)",
    @"23: Reserved (non-VCL)",
    @"24: STAP-A Single-time aggregation packet (non-VCL)",
    @"25: STAP-B Single-time aggregation packet (non-VCL)",
    @"26: MTAP16 Multi-time aggregation packet (non-VCL)",
    @"27: MTAP24 Multi-time aggregation packet (non-VCL)",
    @"28: FU-A Fragmentation unit (non-VCL)",
    @"29: FU-B Fragmentation unit (non-VCL)",
    @"30: Unspecified (non-VCL)",
    @"31: Unspecified (non-VCL)",
};

现在这就是所有魔法发生的地方。

-(void) receivedRawVideoFrame:(uint8_t *)frame withSize:(uint32_t)frameSize isIFrame:(int)isIFrame
{
    OSStatus status;

    uint8_t *data = NULL;
    uint8_t *pps = NULL;
    uint8_t *sps = NULL;

    // I know what my H.264 data source's NALUs look like so I know start code index is always 0.
    // if you don't know where it starts, you can use a for loop similar to how i find the 2nd and 3rd start codes
    int startCodeIndex = 0;
    int secondStartCodeIndex = 0;
    int thirdStartCodeIndex = 0;

    long blockLength = 0;

    CMSampleBufferRef sampleBuffer = NULL;
    CMBlockBufferRef blockBuffer = NULL;

    int nalu_type = (frame[startCodeIndex + 4] & 0x1F);
    NSLog(@"~~~~~~~ Received NALU Type \"%@\" ~~~~~~~~", naluTypesStrings[nalu_type]);

    // if we havent already set up our format description with our SPS PPS parameters, we
    // can't process any frames except type 7 that has our parameters
    if (nalu_type != 7 && _formatDesc == NULL)
    {
        NSLog(@"Video error: Frame is not an I Frame and format description is null");
        return;
    }

    // NALU type 7 is the SPS parameter NALU
    if (nalu_type == 7)
    {
        // find where the second PPS start code begins, (the 0x00 00 00 01 code)
        // from which we also get the length of the first SPS code
        for (int i = startCodeIndex + 4; i < startCodeIndex + 40; i++)
        {
            if (frame[i] == 0x00 && frame[i+1] == 0x00 && frame[i+2] == 0x00 && frame[i+3] == 0x01)
            {
                secondStartCodeIndex = i;
                _spsSize = secondStartCodeIndex;   // includes the header in the size
                break;
            }
        }

        // find what the second NALU type is
        nalu_type = (frame[secondStartCodeIndex + 4] & 0x1F);
        NSLog(@"~~~~~~~ Received NALU Type \"%@\" ~~~~~~~~", naluTypesStrings[nalu_type]);
    }

    // type 8 is the PPS parameter NALU
    if(nalu_type == 8)
    {
        // find where the NALU after this one starts so we know how long the PPS parameter is
        for (int i = _spsSize + 4; i < _spsSize + 30; i++)
        {
            if (frame[i] == 0x00 && frame[i+1] == 0x00 && frame[i+2] == 0x00 && frame[i+3] == 0x01)
            {
                thirdStartCodeIndex = i;
                _ppsSize = thirdStartCodeIndex - _spsSize;
                break;
            }
        }

        // allocate enough data to fit the SPS and PPS parameters into our data objects.
        // VTD doesn't want you to include the start code header (4 bytes long) so we add the - 4 here
        sps = malloc(_spsSize - 4);
        pps = malloc(_ppsSize - 4);

        // copy in the actual sps and pps values, again ignoring the 4 byte header
        memcpy (sps, &frame[4], _spsSize-4);
        memcpy (pps, &frame[_spsSize+4], _ppsSize-4);

        // now we set our H264 parameters
        uint8_t*  parameterSetPointers[2] = {sps, pps};
        size_t parameterSetSizes[2] = {_spsSize-4, _ppsSize-4};

        // suggestion from @Kris Dude's answer below
        if (_formatDesc) 
        {
            CFRelease(_formatDesc);
            _formatDesc = NULL;
        }

        status = CMVideoFormatDescriptionCreateFromH264ParameterSets(kCFAllocatorDefault, 2, 
                                                (const uint8_t *const*)parameterSetPointers, 
                                                parameterSetSizes, 4, 
                                                &_formatDesc);

        NSLog(@"\t\t Creation of CMVideoFormatDescription: %@", (status == noErr) ? @"successful!" : @"failed...");
        if(status != noErr) NSLog(@"\t\t Format Description ERROR type: %d", (int)status);

        // See if decomp session can convert from previous format description 
        // to the new one, if not we need to remake the decomp session.
        // This snippet was not necessary for my applications but it could be for yours
        /*BOOL needNewDecompSession = (VTDecompressionSessionCanAcceptFormatDescription(_decompressionSession, _formatDesc) == NO);
         if(needNewDecompSession)
         {
             [self createDecompSession];
         }*/

        // now lets handle the IDR frame that (should) come after the parameter sets
        // I say "should" because that's how I expect my H264 stream to work, YMMV
        nalu_type = (frame[thirdStartCodeIndex + 4] & 0x1F);
        NSLog(@"~~~~~~~ Received NALU Type \"%@\" ~~~~~~~~", naluTypesStrings[nalu_type]);
    }

    // create our VTDecompressionSession.  This isnt neccessary if you choose to use AVSampleBufferDisplayLayer
    if((status == noErr) && (_decompressionSession == NULL))
    {
        [self createDecompSession];
    }

    // type 5 is an IDR frame NALU.  The SPS and PPS NALUs should always be followed by an IDR (or IFrame) NALU, as far as I know
    if(nalu_type == 5)
    {
        // find the offset, or where the SPS and PPS NALUs end and the IDR frame NALU begins
        int offset = _spsSize + _ppsSize;
        blockLength = frameSize - offset;
        data = malloc(blockLength);
        data = memcpy(data, &frame[offset], blockLength);

        // replace the start code header on this NALU with its size.
        // AVCC format requires that you do this.  
        // htonl converts the unsigned int from host to network byte order
        uint32_t dataLength32 = htonl (blockLength - 4);
        memcpy (data, &dataLength32, sizeof (uint32_t));

        // create a block buffer from the IDR NALU
        status = CMBlockBufferCreateWithMemoryBlock(NULL, data,  // memoryBlock to hold buffered data
                                                    blockLength,  // block length of the mem block in bytes.
                                                    kCFAllocatorNull, NULL,
                                                    0, // offsetToData
                                                    blockLength,   // dataLength of relevant bytes, starting at offsetToData
                                                    0, &blockBuffer);

        NSLog(@"\t\t BlockBufferCreation: \t %@", (status == kCMBlockBufferNoErr) ? @"successful!" : @"failed...");
    }

    // NALU type 1 is non-IDR (or PFrame) picture
    if (nalu_type == 1)
    {
        // non-IDR frames do not have an offset due to SPS and PSS, so the approach
        // is similar to the IDR frames just without the offset
        blockLength = frameSize;
        data = malloc(blockLength);
        data = memcpy(data, &frame[0], blockLength);

        // again, replace the start header with the size of the NALU
        uint32_t dataLength32 = htonl (blockLength - 4);
        memcpy (data, &dataLength32, sizeof (uint32_t));

        status = CMBlockBufferCreateWithMemoryBlock(NULL, data,  // memoryBlock to hold data. If NULL, block will be alloc when needed
                                                    blockLength,  // overall length of the mem block in bytes
                                                    kCFAllocatorNull, NULL,
                                                    0,     // offsetToData
                                                    blockLength,  // dataLength of relevant data bytes, starting at offsetToData
                                                    0, &blockBuffer);

        NSLog(@"\t\t BlockBufferCreation: \t %@", (status == kCMBlockBufferNoErr) ? @"successful!" : @"failed...");
    }

    // now create our sample buffer from the block buffer,
    if(status == noErr)
    {
        // here I'm not bothering with any timing specifics since in my case we displayed all frames immediately
        const size_t sampleSize = blockLength;
        status = CMSampleBufferCreate(kCFAllocatorDefault,
                                      blockBuffer, true, NULL, NULL,
                                      _formatDesc, 1, 0, NULL, 1,
                                      &sampleSize, &sampleBuffer);

        NSLog(@"\t\t SampleBufferCreate: \t %@", (status == noErr) ? @"successful!" : @"failed...");
    }

    if(status == noErr)
    {
        // set some values of the sample buffer's attachments
        CFArrayRef attachments = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, YES);
        CFMutableDictionaryRef dict = (CFMutableDictionaryRef)CFArrayGetValueAtIndex(attachments, 0);
        CFDictionarySetValue(dict, kCMSampleAttachmentKey_DisplayImmediately, kCFBooleanTrue);

        // either send the samplebuffer to a VTDecompressionSession or to an AVSampleBufferDisplayLayer
        [self render:sampleBuffer];
    }

    // free memory to avoid a memory leak, do the same for sps, pps and blockbuffer
    if (NULL != data)
    {
        free (data);
        data = NULL;
    }
}

以下方法创建您的 VTD session。每当您收到 new 参数时重新创建它。 (您不必在每次 每次 收到参数时都重新创建它,这很确定。)

如果您想为目的地 CVPixelBuffer 设置属性,请阅读 CoreVideo PixelBufferAttributes values 并将它们放入 NSDictionary *destinationImageBufferAttributes

-(void) createDecompSession
{
    // make sure to destroy the old VTD session
    _decompressionSession = NULL;
    VTDecompressionOutputCallbackRecord callBackRecord;
    callBackRecord.decompressionOutputCallback = decompressionSessionDecodeFrameCallback;

    // this is necessary if you need to make calls to Objective C "self" from within in the callback method.
    callBackRecord.decompressionOutputRefCon = (__bridge void *)self;

    // you can set some desired attributes for the destination pixel buffer.  I didn't use this but you may
    // if you need to set some attributes, be sure to uncomment the dictionary in VTDecompressionSessionCreate
    NSDictionary *destinationImageBufferAttributes = [NSDictionary dictionaryWithObjectsAndKeys:
                                                      [NSNumber numberWithBool:YES],
                                                      (id)kCVPixelBufferOpenGLESCompatibilityKey,
                                                      nil];

    OSStatus status =  VTDecompressionSessionCreate(NULL, _formatDesc, NULL,
                                                    NULL, // (__bridge CFDictionaryRef)(destinationImageBufferAttributes)
                                                    &callBackRecord, &_decompressionSession);
    NSLog(@"Video Decompression Session Create: \t %@", (status == noErr) ? @"successful!" : @"failed...");
    if(status != noErr) NSLog(@"\t\t VTD ERROR type: %d", (int)status);
}

现在,每次 VTD 解压缩您发送给它的任何帧时,都会调用此方法。即使出现错误或丢帧,也会调用此方法。

void decompressionSessionDecodeFrameCallback(void *decompressionOutputRefCon,
                                             void *sourceFrameRefCon,
                                             OSStatus status,
                                             VTDecodeInfoFlags infoFlags,
                                             CVImageBufferRef imageBuffer,
                                             CMTime presentationTimeStamp,
                                             CMTime presentationDuration)
{
    THISCLASSNAME *streamManager = (__bridge THISCLASSNAME *)decompressionOutputRefCon;

    if (status != noErr)
    {
        NSError *error = [NSError errorWithDomain:NSOSStatusErrorDomain code:status userInfo:nil];
        NSLog(@"Decompressed error: %@", error);
    }
    else
    {
        NSLog(@"Decompressed sucessfully");

        // do something with your resulting CVImageBufferRef that is your decompressed frame
        [streamManager displayDecodedFrame:imageBuffer];
    }
}

这是我们实际将 sampleBuffer 发送到 VTD 进行解码的地方。

- (void) render:(CMSampleBufferRef)sampleBuffer
{
    VTDecodeFrameFlags flags = kVTDecodeFrame_EnableAsynchronousDecompression;
    VTDecodeInfoFlags flagOut;
    NSDate* currentTime = [NSDate date];
    VTDecompressionSessionDecodeFrame(_decompressionSession, sampleBuffer, flags,
                                      (void*)CFBridgingRetain(currentTime), &flagOut);

    CFRelease(sampleBuffer);

    // if you're using AVSampleBufferDisplayLayer, you only need to use this line of code
    // [videoLayer enqueueSampleBuffer:sampleBuffer];
}

如果您使用 AVSampleBufferDisplayLayer,请务必在 viewDidLoad 或其他一些初始化方法中像这样初始化图层。

-(void) viewDidLoad
{
    // create our AVSampleBufferDisplayLayer and add it to the view
    videoLayer = [[AVSampleBufferDisplayLayer alloc] init];
    videoLayer.frame = self.view.frame;
    videoLayer.bounds = self.view.bounds;
    videoLayer.videoGravity = AVLayerVideoGravityResizeAspect;

    // set Timebase, you may need this if you need to display frames at specific times
    // I didn't need it so I haven't verified that the timebase is working
    CMTimebaseRef controlTimebase;
    CMTimebaseCreateWithMasterClock(CFAllocatorGetDefault(), CMClockGetHostTimeClock(), &controlTimebase);

    //videoLayer.controlTimebase = controlTimebase;
    CMTimebaseSetTime(self.videoLayer.controlTimebase, kCMTimeZero);
    CMTimebaseSetRate(self.videoLayer.controlTimebase, 1.0);

    [[self.view layer] addSublayer:videoLayer];
}

如果您在框架中找不到 VTD 错误代码,我决定将它们包含在此处。 (同样,所有这些错误以及更多错误都可以在项目导航器中的 VideoToolbox.framework 本身的文件 VTErrors.h 中找到。)

如果操作不当,您将在 VTD 解码帧回调中或在创建 VTD 会话时收到这些错误代码之一。

kVTPropertyNotSupportedErr              = -12900,
kVTPropertyReadOnlyErr                  = -12901,
kVTParameterErr                         = -12902,
kVTInvalidSessionErr                    = -12903,
kVTAllocationFailedErr                  = -12904,
kVTPixelTransferNotSupportedErr         = -12905, // c.f. -8961
kVTCouldNotFindVideoDecoderErr          = -12906,
kVTCouldNotCreateInstanceErr            = -12907,
kVTCouldNotFindVideoEncoderErr          = -12908,
kVTVideoDecoderBadDataErr               = -12909, // c.f. -8969
kVTVideoDecoderUnsupportedDataFormatErr = -12910, // c.f. -8970
kVTVideoDecoderMalfunctionErr           = -12911, // c.f. -8960
kVTVideoEncoderMalfunctionErr           = -12912,
kVTVideoDecoderNotAvailableNowErr       = -12913,
kVTImageRotationNotSupportedErr         = -12914,
kVTVideoEncoderNotAvailableNowErr       = -12915,
kVTFormatDescriptionChangeNotSupportedErr   = -12916,
kVTInsufficientSourceColorDataErr       = -12917,
kVTCouldNotCreateColorCorrectionDataErr = -12918,
kVTColorSyncTransformConvertFailedErr   = -12919,
kVTVideoDecoderAuthorizationErr         = -12210,
kVTVideoEncoderAuthorizationErr         = -12211,
kVTColorCorrectionPixelTransferFailedErr    = -12212,
kVTMultiPassStorageIdentifierMismatchErr    = -12213,
kVTMultiPassStorageInvalidErr           = -12214,
kVTFrameSiloInvalidTimeStampErr         = -12215,
kVTFrameSiloInvalidTimeRangeErr         = -12216,
kVTCouldNotFindTemporalFilterErr        = -12217,
kVTPixelTransferNotPermittedErr         = -12218,

除了上面的 VTErrors,我认为值得添加您在尝试 Livy 的示例时可能遇到的 CMFormatDescription、CMBlockBuffer、CMSampleBuffer 错误。

kCMFormatDescriptionError_InvalidParameter  = -12710,
kCMFormatDescriptionError_AllocationFailed  = -12711,
kCMFormatDescriptionError_ValueNotAvailable = -12718,

kCMBlockBufferNoErr                             = 0,
kCMBlockBufferStructureAllocationFailedErr      = -12700,
kCMBlockBufferBlockAllocationFailedErr          = -12701,
kCMBlockBufferBadCustomBlockSourceErr           = -12702,
kCMBlockBufferBadOffsetParameterErr             = -12703,
kCMBlockBufferBadLengthParameterErr             = -12704,
kCMBlockBufferBadPointerParameterErr            = -12705,
kCMBlockBufferEmptyBBufErr                      = -12706,
kCMBlockBufferUnallocatedBlockErr               = -12707,
kCMBlockBufferInsufficientSpaceErr              = -12708,

kCMSampleBufferError_AllocationFailed             = -12730,
kCMSampleBufferError_RequiredParameterMissing     = -12731,
kCMSampleBufferError_AlreadyHasDataBuffer         = -12732,
kCMSampleBufferError_BufferNotReady               = -12733,
kCMSampleBufferError_SampleIndexOutOfRange        = -12734,
kCMSampleBufferError_BufferHasNoSampleSizes       = -12735,
kCMSampleBufferError_BufferHasNoSampleTimingInfo  = -12736,
kCMSampleBufferError_ArrayTooSmall                = -12737,
kCMSampleBufferError_InvalidEntryCount            = -12738,
kCMSampleBufferError_CannotSubdivide              = -12739,
kCMSampleBufferError_SampleTimingInfoInvalid      = -12740,
kCMSampleBufferError_InvalidMediaTypeForOperation = -12741,
kCMSampleBufferError_InvalidSampleData            = -12742,
kCMSampleBufferError_InvalidMediaFormat           = -12743,
kCMSampleBufferError_Invalidated                  = -12744,
kCMSampleBufferError_DataFailed                   = -16750,
kCMSampleBufferError_DataCanceled                 = -16751,

可以在 Josh Baker 的 Avios 库中找到一个很好的 Swift 例子:https://github.com/tidwall/Avios

请注意,Avios 目前希望用户在 NAL 起始代码处处理分块数据,但确实会处理从该点向前的数据解码。

同样值得一看的是基于 Swift 的 RTMP 库 HaishinKit(以前称为 "LF"),它有自己的解码实现,包括更强大的 NALU 解析:https://github.com/shogo4405/lf.swift

@Livy 要在 CMVideoFormatDescriptionCreateFromH264ParameterSets 之前消除内存泄漏,您应该添加以下内容:

if (_formatDesc) {
    CFRelease(_formatDesc);
    _formatDesc = NULL;
}

感谢 Olivia 提供的详细信息 post! 我最近开始使用 Xamarin 表单在 iPad Pro 上编写一个流媒体应用程序,这篇文章帮助很大,我在整个网络上找到了很多对它的引用。

我想很多人已经在 Xamarin 中重写了 Olivia 的示例,我并不声称自己是世界上最好的程序员。但是由于这里还没有人 post 编辑 C#/Xamarin 版本,我想为上面的 post 提供一些东西回馈社区,这里是我的 C#/Xamarin 版本。也许它可以帮助某人加快她或他的项目的进度。

我一直在关注 Olivia 的例子,我什至保留了她的大部分评论。

首先,因为我更喜欢处理枚举而不是数字,所以我声明了这个 NALU 枚举。 为了完整起见,我还添加了一些我在互联网上找到的“奇特”NALU 类型:

public enum NALUnitType : byte
{
    NALU_TYPE_UNKNOWN = 0,
    NALU_TYPE_SLICE = 1,
    NALU_TYPE_DPA = 2,
    NALU_TYPE_DPB = 3,
    NALU_TYPE_DPC = 4,
    NALU_TYPE_IDR = 5,
    NALU_TYPE_SEI = 6,
    NALU_TYPE_SPS = 7,
    NALU_TYPE_PPS = 8,
    NALU_TYPE_AUD = 9,
    NALU_TYPE_EOSEQ = 10,
    NALU_TYPE_EOSTREAM = 11,
    NALU_TYPE_FILL = 12,

    NALU_TYPE_13 = 13,
    NALU_TYPE_14 = 14,
    NALU_TYPE_15 = 15,
    NALU_TYPE_16 = 16,
    NALU_TYPE_17 = 17,
    NALU_TYPE_18 = 18,
    NALU_TYPE_19 = 19,
    NALU_TYPE_20 = 20,
    NALU_TYPE_21 = 21,
    NALU_TYPE_22 = 22,
    NALU_TYPE_23 = 23,

    NALU_TYPE_STAP_A = 24,
    NALU_TYPE_STAP_B = 25,
    NALU_TYPE_MTAP16 = 26,
    NALU_TYPE_MTAP24 = 27,
    NALU_TYPE_FU_A = 28,
    NALU_TYPE_FU_B = 29,
}

或多或少出于方便的原因,我还为 NALU 描述定义了一个额外的字典:

public static Dictionary<NALUnitType, string> GetDescription { get; } =
new Dictionary<NALUnitType, string>()
{
    { NALUnitType.NALU_TYPE_UNKNOWN, "Unspecified (non-VCL)" },
    { NALUnitType.NALU_TYPE_SLICE, "Coded slice of a non-IDR picture (VCL) [P-frame]" },
    { NALUnitType.NALU_TYPE_DPA, "Coded slice data partition A (VCL)" },
    { NALUnitType.NALU_TYPE_DPB, "Coded slice data partition B (VCL)" },
    { NALUnitType.NALU_TYPE_DPC, "Coded slice data partition C (VCL)" },
    { NALUnitType.NALU_TYPE_IDR, "Coded slice of an IDR picture (VCL) [I-frame]" },
    { NALUnitType.NALU_TYPE_SEI, "Supplemental Enhancement Information [SEI] (non-VCL)" },
    { NALUnitType.NALU_TYPE_SPS, "Sequence Parameter Set [SPS] (non-VCL)" },
    { NALUnitType.NALU_TYPE_PPS, "Picture Parameter Set [PPS] (non-VCL)" },
    { NALUnitType.NALU_TYPE_AUD, "Access Unit Delimiter [AUD] (non-VCL)" },
    { NALUnitType.NALU_TYPE_EOSEQ, "End of Sequence (non-VCL)" },
    { NALUnitType.NALU_TYPE_EOSTREAM, "End of Stream (non-VCL)" },
    { NALUnitType.NALU_TYPE_FILL, "Filler data (non-VCL)" },
    { NALUnitType.NALU_TYPE_13, "Sequence Parameter Set Extension (non-VCL)" },
    { NALUnitType.NALU_TYPE_14, "Prefix NAL Unit (non-VCL)" },
    { NALUnitType.NALU_TYPE_15, "Subset Sequence Parameter Set (non-VCL)" },
    { NALUnitType.NALU_TYPE_16, "Reserved (non-VCL)" },
    { NALUnitType.NALU_TYPE_17, "Reserved (non-VCL)" },
    { NALUnitType.NALU_TYPE_18, "Reserved (non-VCL)" },
    { NALUnitType.NALU_TYPE_19, "Coded slice of an auxiliary coded picture without partitioning (non-VCL)" },
    { NALUnitType.NALU_TYPE_20, "Coded Slice Extension (non-VCL)" },
    { NALUnitType.NALU_TYPE_21, "Coded Slice Extension for Depth View Components (non-VCL)" },
    { NALUnitType.NALU_TYPE_22, "Reserved (non-VCL)" },
    { NALUnitType.NALU_TYPE_23, "Reserved (non-VCL)" },
    { NALUnitType.NALU_TYPE_STAP_A, "STAP-A Single-time Aggregation Packet (non-VCL)" },
    { NALUnitType.NALU_TYPE_STAP_B, "STAP-B Single-time Aggregation Packet (non-VCL)" },
    { NALUnitType.NALU_TYPE_MTAP16, "MTAP16 Multi-time Aggregation Packet (non-VCL)" },
    { NALUnitType.NALU_TYPE_MTAP24, "MTAP24 Multi-time Aggregation Packet (non-VCL)" },
    { NALUnitType.NALU_TYPE_FU_A, "FU-A Fragmentation Unit (non-VCL)" },
    { NALUnitType.NALU_TYPE_FU_B, "FU-B Fragmentation Unit (non-VCL)" }
};

我的主要解码过程来了。我假设接收到的帧是原始字节数组:

    public void Decode(byte[] frame)
    {
        uint frameSize = (uint)frame.Length;
        SendDebugMessage($"Received frame of {frameSize} bytes.");

        // I know how my H.264 data source's NALUs looks like so I know start code index is always 0.
        // if you don't know where it starts, you can use a for loop similar to how I find the 2nd and 3rd start codes
        uint firstStartCodeIndex = 0;
        uint secondStartCodeIndex = 0;
        uint thirdStartCodeIndex = 0;

        // length of NALU start code in bytes.
        // for h.264 the start code is 4 bytes and looks like this: 0 x 00 00 00 01
        const uint naluHeaderLength = 4;

        // check the first 8bits after the NALU start code, mask out bits 0-2, the NALU type ID is in bits 3-7
        uint startNaluIndex = firstStartCodeIndex + naluHeaderLength;
        byte startByte = frame[startNaluIndex];
        int naluTypeId = startByte & 0x1F; // 0001 1111
        NALUnitType naluType = (NALUnitType)naluTypeId;
        SendDebugMessage($"1st Start Code Index: {firstStartCodeIndex}");
        SendDebugMessage($"1st NALU Type: '{NALUnit.GetDescription[naluType]}' ({(int)naluType})");

        // bits 1 and 2 are the NRI
        int nalRefIdc = startByte & 0x60; // 0110 0000
        SendDebugMessage($"1st NRI (NAL Ref Idc): {nalRefIdc}");

        // IF the very first NALU type is an IDR -> handle it like a slice frame (-> re-cast it to type 1 [Slice])
        if (naluType == NALUnitType.NALU_TYPE_IDR)
        {
            naluType = NALUnitType.NALU_TYPE_SLICE;
        }

        // if we haven't already set up our format description with our SPS PPS parameters,
        // we can't process any frames except type 7 that has our parameters
        if (naluType != NALUnitType.NALU_TYPE_SPS && this.FormatDescription == null)
        {
            SendDebugMessage("Video Error: Frame is not an I-Frame and format description is null.");
            return;
        }
        
        // NALU type 7 is the SPS parameter NALU
        if (naluType == NALUnitType.NALU_TYPE_SPS)
        {
            // find where the second PPS 4byte start code begins (0x00 00 00 01)
            // from which we also get the length of the first SPS code
            for (uint i = firstStartCodeIndex + naluHeaderLength; i < firstStartCodeIndex + 40; i++)
            {
                if (frame[i] == 0x00 && frame[i + 1] == 0x00 && frame[i + 2] == 0x00 && frame[i + 3] == 0x01)
                {
                    secondStartCodeIndex = i;
                    this.SpsSize = secondStartCodeIndex;   // includes the header in the size
                    SendDebugMessage($"2nd Start Code Index: {secondStartCodeIndex} -> SPS Size: {this.SpsSize}");
                    break;
                }
            }

            // find what the second NALU type is
            startByte = frame[secondStartCodeIndex + naluHeaderLength];
            naluType = (NALUnitType)(startByte & 0x1F);
            SendDebugMessage($"2nd NALU Type: '{NALUnit.GetDescription[naluType]}' ({(int)naluType})");
            
            // bits 1 and 2 are the NRI
            nalRefIdc = startByte & 0x60; // 0110 0000
            SendDebugMessage($"2nd NRI (NAL Ref Idc): {nalRefIdc}");
        }

        // type 8 is the PPS parameter NALU
        if (naluType == NALUnitType.NALU_TYPE_PPS)
        {
            // find where the NALU after this one starts so we know how long the PPS parameter is
            for (uint i = this.SpsSize + naluHeaderLength; i < this.SpsSize + 30; i++)
            {
                if (frame[i] == 0x00 && frame[i + 1] == 0x00 && frame[i + 2] == 0x00 && frame[i + 3] == 0x01)
                {
                    thirdStartCodeIndex = i;
                    this.PpsSize = thirdStartCodeIndex - this.SpsSize;
                    SendDebugMessage($"3rd Start Code Index: {thirdStartCodeIndex} -> PPS Size: {this.PpsSize}");
                    break;
                }
            }

            // allocate enough data to fit the SPS and PPS parameters into our data objects.
            // VTD doesn't want you to include the start code header (4 bytes long) so we subtract 4 here
            byte[] sps = new byte[this.SpsSize - naluHeaderLength];
            byte[] pps = new byte[this.PpsSize - naluHeaderLength];

            // copy in the actual sps and pps values, again ignoring the 4 byte header
            Array.Copy(frame, naluHeaderLength, sps, 0, sps.Length);
            Array.Copy(frame, this.SpsSize + naluHeaderLength, pps,0, pps.Length);
            
            // create video format description
            List<byte[]> parameterSets = new List<byte[]> { sps, pps };
            this.FormatDescription = CMVideoFormatDescription.FromH264ParameterSets(parameterSets, (int)naluHeaderLength, out CMFormatDescriptionError formatDescriptionError);
            SendDebugMessage($"Creation of CMVideoFormatDescription: {((formatDescriptionError == CMFormatDescriptionError.None)? $"Successful! (Video Codec = {this.FormatDescription.VideoCodecType}, Dimension = {this.FormatDescription.Dimensions.Height} x {this.FormatDescription.Dimensions.Width}px, Type = {this.FormatDescription.MediaType})" : $"Failed ({formatDescriptionError})")}");

            // re-create the decompression session whenever new PPS data was received
            this.DecompressionSession = this.CreateDecompressionSession(this.FormatDescription);

            // now lets handle the IDR frame that (should) come after the parameter sets
            // I say "should" because that's how I expect my H264 stream to work, YMMV
            startByte = frame[thirdStartCodeIndex + naluHeaderLength];
            naluType = (NALUnitType)(startByte & 0x1F);
            SendDebugMessage($"3rd NALU Type: '{NALUnit.GetDescription[naluType]}' ({(int)naluType})");

            // bits 1 and 2 are the NRI
            nalRefIdc = startByte & 0x60; // 0110 0000
            SendDebugMessage($"3rd NRI (NAL Ref Idc): {nalRefIdc}");
        }

        // type 5 is an IDR frame NALU.
        // The SPS and PPS NALUs should always be followed by an IDR (or IFrame) NALU, as far as I know.
        if (naluType == NALUnitType.NALU_TYPE_IDR || naluType == NALUnitType.NALU_TYPE_SLICE)
        {
            // find the offset or where IDR frame NALU begins (after the SPS and PPS NALUs end) 
            uint offset = (naluType == NALUnitType.NALU_TYPE_SLICE)? 0 : this.SpsSize + this.PpsSize;
            uint blockLength = frameSize - offset;
            SendDebugMessage($"Block Length (NALU type '{naluType}'): {blockLength}");

            var blockData = new byte[blockLength];
            Array.Copy(frame, offset, blockData, 0, blockLength);

            // write the size of the block length (IDR picture data) at the beginning of the IDR block.
            // this means we replace the start code header (0 x 00 00 00 01) of the IDR NALU with the block size.
            // AVCC format requires that you do this.

            // This next block is very specific to my application and wasn't in Olivia's example:
            // For my stream is encoded by NVIDEA NVEC I had to deal with additional 3-byte start codes within my IDR/SLICE frame.
            // These start codes must be replaced by 4 byte start codes adding the block length as big endian.
            // ======================================================================================================================================================

            // find all 3 byte start code indices (0x00 00 01) within the block data (including the first 4 bytes of NALU header)
            uint startCodeLength = 3;
            List<uint> foundStartCodeIndices = new List<uint>();
            for (uint i = 0; i < blockData.Length; i++)
            {
                if (blockData[i] == 0x00 && blockData[i + 1] == 0x00 && blockData[i + 2] == 0x01)
                {
                    foundStartCodeIndices.Add(i);
                    byte naluByte = blockData[i + startCodeLength];
                    var tmpNaluType = (NALUnitType)(naluByte & 0x1F);
                    SendDebugMessage($"3-Byte Start Code (0x000001) found at index: {i} (NALU type {(int)tmpNaluType} '{NALUnit.GetDescription[tmpNaluType]}'");
                }
            }

            // determine the byte length of each slice
            uint totalLength = 0;
            List<uint> sliceLengths = new List<uint>();
            for (int i = 0; i < foundStartCodeIndices.Count; i++)
            {
                // for convenience only
                bool isLastValue = (i == foundStartCodeIndices.Count-1);

                // start-index to bit right after the start code
                uint startIndex = foundStartCodeIndices[i] + startCodeLength;
                
                // set end-index to bit right before beginning of next start code or end of frame
                uint endIndex = isLastValue ? (uint) blockData.Length : foundStartCodeIndices[i + 1];
                
                // now determine slice length including NALU header
                uint sliceLength = (endIndex - startIndex) + naluHeaderLength;

                // add length to list
                sliceLengths.Add(sliceLength);

                // sum up total length of all slices (including NALU header)
                totalLength += sliceLength;
            }

            // Arrange slices like this: 
            // [4byte slice1 size][slice1 data][4byte slice2 size][slice2 data]...[4byte slice4 size][slice4 data]
            // Replace 3-Byte Start Code with 4-Byte start code, then replace the 4-Byte start codes with the length of the following data block (big endian).
            // 

            byte[] finalBuffer = new byte[totalLength];
            uint destinationIndex = 0;
            
            // create a buffer for each slice and append it to the final block buffer
            for (int i = 0; i < sliceLengths.Count; i++)
            {
                // create byte vector of size of current slice, add additional bytes for NALU start code length
                byte[] sliceData = new byte[sliceLengths[i]];

                // now copy the data of current slice into the byte vector,
                // start reading data after the 3-byte start code
                // start writing data after NALU start code,
                uint sourceIndex = foundStartCodeIndices[i] + startCodeLength;
                long dataLength = sliceLengths[i] - naluHeaderLength;
                Array.Copy(blockData, sourceIndex, sliceData, naluHeaderLength, dataLength);

                // replace the NALU start code with data length as big endian
                byte[] sliceLengthInBytes = BitConverter.GetBytes(sliceLengths[i] - naluHeaderLength);
                Array.Reverse(sliceLengthInBytes);
                Array.Copy(sliceLengthInBytes, 0, sliceData, 0, naluHeaderLength);

                // add the slice data to final buffer
                Array.Copy(sliceData, 0, finalBuffer, destinationIndex, sliceData.Length);
                destinationIndex += sliceLengths[i];
            }
            
            // ======================================================================================================================================================

            // from here we are back on track with Olivia's code:

            // now create block buffer from final byte[] buffer
            CMBlockBufferFlags flags = CMBlockBufferFlags.AssureMemoryNow | CMBlockBufferFlags.AlwaysCopyData;
            var finalBlockBuffer = CMBlockBuffer.FromMemoryBlock(finalBuffer, 0, flags, out CMBlockBufferError blockBufferError);
            SendDebugMessage($"Creation of Final Block Buffer: {(blockBufferError == CMBlockBufferError.None ? "Successful!" : $"Failed ({blockBufferError})")}");
            if (blockBufferError != CMBlockBufferError.None) return;

            // now create the sample buffer
            nuint[] sampleSizeArray = new nuint[] { totalLength };
            CMSampleBuffer sampleBuffer = CMSampleBuffer.CreateReady(finalBlockBuffer, this.FormatDescription, 1, null, sampleSizeArray, out CMSampleBufferError sampleBufferError);
            SendDebugMessage($"Creation of Final Sample Buffer: {(sampleBufferError == CMSampleBufferError.None ? "Successful!" : $"Failed ({sampleBufferError})")}");
            if (sampleBufferError != CMSampleBufferError.None) return;

            // if sample buffer was successfully created -> pass sample to decoder

            // set sample attachments
            CMSampleBufferAttachmentSettings[] attachments = sampleBuffer.GetSampleAttachments(true);
            var attachmentSetting = attachments[0];
            attachmentSetting.DisplayImmediately = true;

            // enable async decoding
            VTDecodeFrameFlags decodeFrameFlags = VTDecodeFrameFlags.EnableAsynchronousDecompression;

            // add time stamp
            var currentTime = DateTime.Now;
            var currentTimePtr = new IntPtr(currentTime.Ticks);

            // send the sample buffer to a VTDecompressionSession
            var result = DecompressionSession.DecodeFrame(sampleBuffer, decodeFrameFlags, currentTimePtr, out VTDecodeInfoFlags decodeInfoFlags);

            if (result == VTStatus.Ok)
            {
                SendDebugMessage($"Executing DecodeFrame(..): Successful! (Info: {decodeInfoFlags})");
            }
            else
            {
                NSError error = new NSError(CFErrorDomain.OSStatus, (int)result);
                SendDebugMessage($"Executing DecodeFrame(..): Failed ({(VtStatusEx)result} [0x{(int)result:X8}] - {error}) -  Info: {decodeInfoFlags}");
            }
        }
    }

我创建解压会话的函数如下所示:

    private VTDecompressionSession CreateDecompressionSession(CMVideoFormatDescription formatDescription)
    {
        VTDecompressionSession.VTDecompressionOutputCallback callBackRecord = this.DecompressionSessionDecodeFrameCallback;

        VTVideoDecoderSpecification decoderSpecification = new VTVideoDecoderSpecification
        {
            EnableHardwareAcceleratedVideoDecoder = true
        };

        CVPixelBufferAttributes destinationImageBufferAttributes = new CVPixelBufferAttributes();

        try
        {
            var decompressionSession = VTDecompressionSession.Create(callBackRecord, formatDescription, decoderSpecification, destinationImageBufferAttributes);
            SendDebugMessage("Video Decompression Session Creation: Successful!");
            return decompressionSession;
        }
        catch (Exception e)
        {
            SendDebugMessage($"Video Decompression Session Creation: Failed ({e.Message})");
            return null;
        }
    }

解压会话回调例程:

    private void DecompressionSessionDecodeFrameCallback(
        IntPtr sourceFrame,
        VTStatus status,
        VTDecodeInfoFlags infoFlags,
        CVImageBuffer imageBuffer,
        CMTime presentationTimeStamp,
        CMTime presentationDuration)
    {
        
        if (status != VTStatus.Ok)
        {
            NSError error = new NSError(CFErrorDomain.OSStatus, (int)status);
            SendDebugMessage($"Decompression: Failed ({(VtStatusEx)status} [0x{(int)status:X8}] - {error})");
        }
        else
        {
            SendDebugMessage("Decompression: Successful!");

            try
            {
                var image = GetImageFromImageBuffer(imageBuffer);

                // In my application I do not use a display layer but send the decoded image directly by an event:
                
                ImageSource imgSource = ImageSource.FromStream(() => image.AsPNG().AsStream());
                OnImageFrameReady?.Invoke(imgSource);
            }
            catch (Exception e)
            {
                SendDebugMessage(e.ToString());
            }

        }
    }

我使用此函数将 CVImageBuffer 转换为 UIImage。也指Olivia的post上面提到的其中一个(how to convert a CVImageBufferRef to UIImage):

    private UIImage GetImageFromImageBuffer(CVImageBuffer imageBuffer)
    {
        if (!(imageBuffer is CVPixelBuffer pixelBuffer)) return null;
        
        var ciImage = CIImage.FromImageBuffer(pixelBuffer);
        var temporaryContext = new CIContext();

        var rect = CGRect.FromLTRB(0, 0, pixelBuffer.Width, pixelBuffer.Height);
        CGImage cgImage = temporaryContext.CreateCGImage(ciImage, rect);
        if (cgImage == null) return null;
        
        var uiImage = UIImage.FromImage(cgImage);
        cgImage.Dispose();
        return uiImage;
    }

最后但并非最不重要的一点是我用于调试输出的小功能,请根据需要随意调整它;-)

    private void SendDebugMessage(string msg)
    {
        Debug.WriteLine($"VideoDecoder (iOS) - {msg}");
    }

最后,让我们看一下上面代码使用的命名空间:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Net;
using AvcLibrary;
using CoreFoundation;
using CoreGraphics;
using CoreImage;
using CoreMedia;
using CoreVideo;
using Foundation;
using UIKit;
using VideoToolbox;
using Xamarin.Forms;