IMFTransform::ProcessOutput 效率

IMFTransform::ProcessOutput Efficiency

我注意到,正如明显记录的那样,IMFTransform::ProcessOutput() 的重采样器每次调用只能输出一个样本!我猜它更倾向于大帧视频编码。鉴于我一直在查看的所有代码作为相关音频播放的参考,每次调用 ProcessOutput 都会分配一个 IMFMediaBuffer,这似乎有点疯狂和糟糕的架构 - 除非我遗漏了什么?

从媒体缓冲区使用的角度来看,它尤其糟糕。例如,一个解码我的测试 MP3 的 SourceReader 在一个样本和一个缓冲区中为我提供了大约 64KB 的块。这是明智的。但是 GetOutputStreamInfo() 每次调用 ProcessOutput() 请求的媒体缓冲区仅为 24 字节。

64KB 块 => 分成许多 24B 块 => 进一步处理,似乎非常愚蠢的开销(重采样器每 24 字节会做很多开销,如果不是,则稍后在管道中强制执行该开销合并)。

来自https://docs.microsoft.com/en-us/windows/win32/api/mftransform/nf-mftransform-imftransform-processoutput

上面写着:

  1. 在对 ProcessOutput
  2. 的单次调用中,MFT 不能 return 每个流超过一个样本
  3. MFT 将输出数据写入缓冲区的开头,覆盖缓冲区中已存在的任何数据

所以它甚至不能追加到附加到样本的部分满缓冲区的末尾。

我可以创建自己的支持媒体缓冲区接口的池化对象,但我猜指针会撞到香草锁定的媒体缓冲区。唯一的其他选择似乎是 lock/copy 这 24 个字节到另一个更大的缓冲区进行处理。但这一切似乎都过分了,而且粒度不对。

处理此问题的最佳方法是什么?

这是到目前为止我测试的简化草图:

...

status = transform->ProcessInput(0, sample, 0);
sample->Release();

while(1)
{
MFT_OUTPUT_STREAM_INFO outDetails{};
    MFT_OUTPUT_DATA_BUFFER outData{};
    IMFMediaBuffer* outBuffer;
    IMFSample* outSample;
    DWORD outStatus;

    
    
    status = transform->GetOutputStreamInfo(0, &outDetails);
    
    status = MFCreateAlignedMemoryBuffer(outDetails.cbSize, outDetails.cbAlignment, &outBuffer);
    status = MFCreateSample(&outSample);
    status = outSample->AddBuffer(outBuffer);
    outBuffer->Release();
    
    outData.pSample = outSample;
    
    status = transform->ProcessOutput(0, 1, &outData, &outStatus);
    if (status == MF_E_TRANSFORM_NEED_MORE_INPUT)
        break;
    
    ...
}

我为您编写了一些代码来证明音频重采样器能够一次处理大型音频块。很好,高效的处理方式:

winrt::com_ptr<IMFTransform> Transform;
winrt::check_hresult(CoCreateInstance(CLSID_CResamplerMediaObject, nullptr, CLSCTX_ALL, IID_PPV_ARGS(Transform.put())));

WAVEFORMATEX InputWaveFormatEx { WAVE_FORMAT_PCM, 1, 44100, 44100 * 2, 2, 16 };
WAVEFORMATEX OutputWaveFormatEx { WAVE_FORMAT_PCM, 1, 48000, 48000 * 2, 2, 16 };

winrt::com_ptr<IMFMediaType> InputMediaType;
winrt::check_hresult(MFCreateMediaType(InputMediaType.put()));
winrt::check_hresult(MFInitMediaTypeFromWaveFormatEx(InputMediaType.get(), &InputWaveFormatEx, sizeof InputWaveFormatEx));
winrt::com_ptr<IMFMediaType> OutputMediaType;
winrt::check_hresult(MFCreateMediaType(OutputMediaType.put()));
winrt::check_hresult(MFInitMediaTypeFromWaveFormatEx(OutputMediaType.get(), &OutputWaveFormatEx, sizeof OutputWaveFormatEx));

winrt::check_hresult(Transform->SetInputType(0, InputMediaType.get(), 0));
winrt::check_hresult(Transform->SetOutputType(0, OutputMediaType.get(), 0));

MFT_OUTPUT_STREAM_INFO OutputStreamInfo { };
winrt::check_hresult(Transform->GetOutputStreamInfo(0, &OutputStreamInfo));
_A(!(OutputStreamInfo.dwFlags & MFT_OUTPUT_STREAM_SINGLE_SAMPLE_PER_BUFFER));

DWORD const InputMediaBufferSize = InputWaveFormatEx.nAvgBytesPerSec;
winrt::com_ptr<IMFMediaBuffer> InputMediaBuffer;
winrt::check_hresult(MFCreateMemoryBuffer(InputMediaBufferSize, InputMediaBuffer.put()));
winrt::check_hresult(InputMediaBuffer->SetCurrentLength(InputMediaBufferSize));
winrt::com_ptr<IMFSample> InputSample;
winrt::check_hresult(MFCreateSample(InputSample.put()));
winrt::check_hresult(InputSample->AddBuffer(InputMediaBuffer.get()));
winrt::check_hresult(Transform->ProcessInput(0, InputSample.get(), 0));

DWORD const OutputMediaBufferCapacity = OutputWaveFormatEx.nAvgBytesPerSec;
winrt::com_ptr<IMFMediaBuffer> OutputMediaBuffer;
winrt::check_hresult(MFCreateMemoryBuffer(OutputMediaBufferCapacity, OutputMediaBuffer.put()));
winrt::check_hresult(OutputMediaBuffer->SetCurrentLength(0));
winrt::com_ptr<IMFSample> OutputSample;
winrt::check_hresult(MFCreateSample(OutputSample.put()));
winrt::check_hresult(OutputSample->AddBuffer(OutputMediaBuffer.get()));
MFT_OUTPUT_DATA_BUFFER OutputDataBuffer { 0, OutputSample.get() };
DWORD Status;
winrt::check_hresult(Transform->ProcessOutput(0, 1, &OutputDataBuffer, &Status));

DWORD OutputMediaBufferSize = 0;
winrt::check_hresult(OutputMediaBuffer->GetCurrentLength(&OutputMediaBufferSize));

您可以看到,在提供一秒钟的输入后,输出保留了 [几乎] 一秒钟的数据,这符合预期。