英特尔图形硬件 H264 MFT ProcessInput 调用在提供少量输入样本后失败,同样适用于 Nvidia 硬件 MFT

Intel graphics hardware H264 MFT ProcessInput call fails after feeding few input samples, the same works fine with Nvidia hardware MFT

我正在使用 DesktopDuplication API 捕获桌面,并在 GPU 中将样本从 RGBA 转换为 NV12,并将其提供给 MediaFoundation 硬件 H264 MFT。这适用于 Nvidia 图形,也适用于软件编码器,但当只有英特尔图形硬件 MFT 可用时失败。如果我回退到软件 MFT,该代码在同一台英特尔图形机器上运行良好。我还确保编码实际上是在 Nvidia 图形机器上的硬件中完成的。

在 Intel 图形上,MFT returns MEError ("Unspecified error"),这仅在第一个样本被馈送之后发生,随后调用 ProcessInput (当事件生成器触发 METransformNeedInput 时)returns"The callee is currently not accepting further input"。 MFT 在返回这些错误之前很少使用更多样本。这种行为令人困惑,我仅在事件生成器通过 IMFAsyncCallback 异步触发 METransformNeedInput 时才提供样本,并且还正确检查在提供样本后是否立即触发 METransformHaveOutput。当相同的异步逻辑与 Nvidia 硬件 MFT 和 Microsoft 软件编码器一起工作时,这真的让我感到困惑。

英特尔论坛本身也有类似的unresolved question。我的代码类似于英特尔线程中提到的代码,除了我还将 d3d 设备管理器设置为编码器这一事实,如下所示。

而且,还有其他三个堆栈溢出线程报告了类似的问题,但没有给出解决方案 ( & How to create IMFSample from D11 texture for Intel MFT encoder & Asynchronous MFT is not sending MFTransformHaveOutput Event(Intel Hardware MJPEG Decoder MFT))。我已经尝试了所有可能的选项,但没有任何改进。

Color converter code is taken from intel media sdk samples. I have also uploaded my complete code here.

设置d3d管理器的方法:

void SetD3dManager() {

    HRESULT hr = S_OK;

    if (!deviceManager) {

        // Create device manager
        hr = MFCreateDXGIDeviceManager(&resetToken, &deviceManager);
    }

    if (SUCCEEDED(hr)) 
    {
        if (!pD3dDevice) {

            pD3dDevice = GetDeviceDirect3D(0);
        }
    }

    if (pD3dDevice) {

        // NOTE: Getting ready for multi-threaded operation
        const CComQIPtr<ID3D10Multithread> pMultithread = pD3dDevice;
        pMultithread->SetMultithreadProtected(TRUE);

        hr = deviceManager->ResetDevice(pD3dDevice, resetToken);
        CHECK_HR(_pTransform->ProcessMessage(MFT_MESSAGE_SET_D3D_MANAGER, reinterpret_cast<ULONG_PTR>(deviceManager.p)), "Failed to set device manager.");
    }
    else {
        cout << "Failed to get d3d device";
    }
}

获取d3d设备:

CComPtr<ID3D11Device> GetDeviceDirect3D(UINT idxVideoAdapter)
{
    // Create DXGI factory:
    CComPtr<IDXGIFactory1> dxgiFactory;
    DXGI_ADAPTER_DESC1 dxgiAdapterDesc;

    // Direct3D feature level codes and names:

    struct KeyValPair { int code; const char* name; };

    const KeyValPair d3dFLevelNames[] =
    {
        KeyValPair{ D3D_FEATURE_LEVEL_9_1, "Direct3D 9.1" },
        KeyValPair{ D3D_FEATURE_LEVEL_9_2, "Direct3D 9.2" },
        KeyValPair{ D3D_FEATURE_LEVEL_9_3, "Direct3D 9.3" },
        KeyValPair{ D3D_FEATURE_LEVEL_10_0, "Direct3D 10.0" },
        KeyValPair{ D3D_FEATURE_LEVEL_10_1, "Direct3D 10.1" },
        KeyValPair{ D3D_FEATURE_LEVEL_11_0, "Direct3D 11.0" },
        KeyValPair{ D3D_FEATURE_LEVEL_11_1, "Direct3D 11.1" },
    };

    // Feature levels for Direct3D support
    const D3D_FEATURE_LEVEL d3dFeatureLevels[] =
    {
        D3D_FEATURE_LEVEL_11_1,
        D3D_FEATURE_LEVEL_11_0,
        D3D_FEATURE_LEVEL_10_1,
        D3D_FEATURE_LEVEL_10_0,
        D3D_FEATURE_LEVEL_9_3,
        D3D_FEATURE_LEVEL_9_2,
        D3D_FEATURE_LEVEL_9_1,
    };

    constexpr auto nFeatLevels = static_cast<UINT> ((sizeof d3dFeatureLevels) / sizeof(D3D_FEATURE_LEVEL));

    CComPtr<IDXGIAdapter1> dxgiAdapter;
    D3D_FEATURE_LEVEL featLevelCodeSuccess;
    CComPtr<ID3D11Device> d3dDx11Device;

    std::wstring_convert<std::codecvt_utf8<wchar_t>> transcoder;

    HRESULT hr = CreateDXGIFactory1(IID_PPV_ARGS(&dxgiFactory));
    CHECK_HR(hr, "Failed to create DXGI factory");

    // Get a video adapter:
    dxgiFactory->EnumAdapters1(idxVideoAdapter, &dxgiAdapter);

    // Get video adapter description:
    dxgiAdapter->GetDesc1(&dxgiAdapterDesc);

    CHECK_HR(hr, "Failed to retrieve DXGI video adapter description");

    std::cout << "Selected DXGI video adapter is \'"
        << transcoder.to_bytes(dxgiAdapterDesc.Description) << '\'' << std::endl;

    // Create Direct3D device:
    hr = D3D11CreateDevice(
        dxgiAdapter,
        D3D_DRIVER_TYPE_UNKNOWN,
        nullptr,
        (0 * D3D11_CREATE_DEVICE_SINGLETHREADED) | D3D11_CREATE_DEVICE_VIDEO_SUPPORT,
        d3dFeatureLevels,
        nFeatLevels,
        D3D11_SDK_VERSION,
        &d3dDx11Device,
        &featLevelCodeSuccess,
        nullptr
    );

    // Might have failed for lack of Direct3D 11.1 runtime:
    if (hr == E_INVALIDARG)
    {
        // Try again without Direct3D 11.1:
        hr = D3D11CreateDevice(
            dxgiAdapter,
            D3D_DRIVER_TYPE_UNKNOWN,
            nullptr,
            (0 * D3D11_CREATE_DEVICE_SINGLETHREADED) | D3D11_CREATE_DEVICE_VIDEO_SUPPORT,
            d3dFeatureLevels + 1,
            nFeatLevels - 1,
            D3D11_SDK_VERSION,
            &d3dDx11Device,
            &featLevelCodeSuccess,
            nullptr
        );
    }

    // Get name of Direct3D feature level that succeeded upon device creation:
    std::cout << "Hardware device supports " << std::find_if(
        d3dFLevelNames,
        d3dFLevelNames + nFeatLevels,
        [featLevelCodeSuccess](const KeyValPair& entry)
        {
            return entry.code == featLevelCodeSuccess;
        }
    )->name << std::endl;

done:

    return d3dDx11Device;
}

异步回调实现:

struct EncoderCallbacks : IMFAsyncCallback
{
    EncoderCallbacks(IMFTransform* encoder)
    {
        TickEvent = CreateEvent(0, FALSE, FALSE, 0);
        _pEncoder = encoder;
    }

    ~EncoderCallbacks()
    {
        eventGen = nullptr;
        CloseHandle(TickEvent);
    }

    bool Initialize() {

        _pEncoder->QueryInterface(IID_PPV_ARGS(&eventGen));

        if (eventGen) {

            eventGen->BeginGetEvent(this, 0);
            return true;
        }

        return false;
    }

    // dummy IUnknown impl
    virtual HRESULT STDMETHODCALLTYPE QueryInterface(REFIID riid, void** ppvObject) override { return E_NOTIMPL; }
    virtual ULONG STDMETHODCALLTYPE AddRef(void) override { return 1; }
    virtual ULONG STDMETHODCALLTYPE Release(void) override { return 1; }

    virtual HRESULT STDMETHODCALLTYPE GetParameters(DWORD* pdwFlags, DWORD* pdwQueue) override
    {
        // we return immediately and don't do anything except signaling another thread
        *pdwFlags = MFASYNC_SIGNAL_CALLBACK;
        *pdwQueue = MFASYNC_CALLBACK_QUEUE_IO;
        return S_OK;
    }

    virtual HRESULT STDMETHODCALLTYPE Invoke(IMFAsyncResult* pAsyncResult) override
    {
        IMFMediaEvent* event = 0;
        eventGen->EndGetEvent(pAsyncResult, &event);
        if (event)
        {
            MediaEventType type;
            event->GetType(&type);
            switch (type)
            {
            case METransformNeedInput: InterlockedIncrement(&NeedsInput); break;
            case METransformHaveOutput: InterlockedIncrement(&HasOutput); break;
            }
            event->Release();
            SetEvent(TickEvent);
        }

        eventGen->BeginGetEvent(this, 0);
        return S_OK;
    }

    CComQIPtr<IMFMediaEventGenerator> eventGen = nullptr;
    HANDLE TickEvent;
    IMFTransform* _pEncoder = nullptr;

    unsigned int NeedsInput = 0;
    unsigned int HasOutput = 0;
};

生成样本方法:

bool GenerateSampleAsync() {

    DWORD processOutputStatus = 0;
    HRESULT mftProcessOutput = S_OK;
    bool frameSent = false;

    // Create sample
    CComPtr<IMFSample> currentVideoSample = nullptr;

    MFT_OUTPUT_STREAM_INFO StreamInfo;

    // wait for any callback to come in
    WaitForSingleObject(_pEventCallback->TickEvent, INFINITE);

    while (_pEventCallback->NeedsInput) {

        if (!currentVideoSample) {

            (pDesktopDuplication)->releaseBuffer();
            (pDesktopDuplication)->cleanUpCurrentFrameObjects();

            bool bTimeout = false;

            if (pDesktopDuplication->GetCurrentFrameAsVideoSample((void**)& currentVideoSample, waitTime, bTimeout, deviceRect, deviceRect.Width(), deviceRect.Height())) {

                prevVideoSample = currentVideoSample;
            }
            // Feed the previous sample to the encoder in case of no update in display
            else {
                currentVideoSample = prevVideoSample;
            }
        }

        if (currentVideoSample)
        {
            InterlockedDecrement(&_pEventCallback->NeedsInput);
            _frameCount++;

            CHECK_HR(currentVideoSample->SetSampleTime(mTimeStamp), "Error setting the video sample time.");
            CHECK_HR(currentVideoSample->SetSampleDuration(VIDEO_FRAME_DURATION), "Error getting video sample duration.");

            CHECK_HR(_pTransform->ProcessInput(inputStreamID, currentVideoSample, 0), "The resampler H264 ProcessInput call failed.");

            mTimeStamp += VIDEO_FRAME_DURATION;
        }
    }

    while (_pEventCallback->HasOutput) {

        CComPtr<IMFSample> mftOutSample = nullptr;
        CComPtr<IMFMediaBuffer> pOutMediaBuffer = nullptr;

        InterlockedDecrement(&_pEventCallback->HasOutput);

        CHECK_HR(_pTransform->GetOutputStreamInfo(outputStreamID, &StreamInfo), "Failed to get output stream info from H264 MFT.");

        CHECK_HR(MFCreateSample(&mftOutSample), "Failed to create MF sample.");
        CHECK_HR(MFCreateMemoryBuffer(StreamInfo.cbSize, &pOutMediaBuffer), "Failed to create memory buffer.");
        CHECK_HR(mftOutSample->AddBuffer(pOutMediaBuffer), "Failed to add sample to buffer.");

        MFT_OUTPUT_DATA_BUFFER _outputDataBuffer;
        memset(&_outputDataBuffer, 0, sizeof _outputDataBuffer);
        _outputDataBuffer.dwStreamID = outputStreamID;
        _outputDataBuffer.dwStatus = 0;
        _outputDataBuffer.pEvents = nullptr;
        _outputDataBuffer.pSample = mftOutSample;

        mftProcessOutput = _pTransform->ProcessOutput(0, 1, &_outputDataBuffer, &processOutputStatus);

        if (mftProcessOutput != MF_E_TRANSFORM_NEED_MORE_INPUT)
        {
            if (_outputDataBuffer.pSample) {

                CComPtr<IMFMediaBuffer> buf = NULL;
                DWORD bufLength;
                CHECK_HR(_outputDataBuffer.pSample->ConvertToContiguousBuffer(&buf), "ConvertToContiguousBuffer failed.");

                if (buf) {

                    CHECK_HR(buf->GetCurrentLength(&bufLength), "Get buffer length failed.");
                    BYTE* rawBuffer = NULL;

                    fFrameSize = bufLength;
                    fDurationInMicroseconds = 0;
                    gettimeofday(&fPresentationTime, NULL);

                    buf->Lock(&rawBuffer, NULL, NULL);
                    memmove(fTo, rawBuffer, fFrameSize > fMaxSize ? fMaxSize : fFrameSize);

                    bytesTransfered += bufLength;

                    FramedSource::afterGetting(this);

                    buf->Unlock();

                    frameSent = true;
                }
            }

            if (_outputDataBuffer.pEvents)
                _outputDataBuffer.pEvents->Release();
        }
        else if (MF_E_TRANSFORM_STREAM_CHANGE == mftProcessOutput) {

            // some encoders want to renegotiate the output format. 
            if (_outputDataBuffer.dwStatus & MFT_OUTPUT_DATA_BUFFER_FORMAT_CHANGE)
            {
                CComPtr<IMFMediaType> pNewOutputMediaType = nullptr;
                HRESULT res = _pTransform->GetOutputAvailableType(outputStreamID, 1, &pNewOutputMediaType);

                res = _pTransform->SetOutputType(0, pNewOutputMediaType, 0);//setting the type again
                CHECK_HR(res, "Failed to set output type during stream change");
            }
        }
        else {
            HandleFailure();
        }
    }

    return frameSent;
}

创建视频样本和颜色转换:

bool GetCurrentFrameAsVideoSample(void **videoSample, int waitTime, bool &isTimeout, CRect &deviceRect, int surfaceWidth, int surfaceHeight)
{

FRAME_DATA currentFrameData;

m_LastErrorCode = m_DuplicationManager.GetFrame(&currentFrameData, waitTime, &isTimeout);

if (!isTimeout && SUCCEEDED(m_LastErrorCode)) {

    m_CurrentFrameTexture = currentFrameData.Frame;

    if (!pDstTexture) {

        D3D11_TEXTURE2D_DESC desc;
        ZeroMemory(&desc, sizeof(D3D11_TEXTURE2D_DESC));

        desc.Format = DXGI_FORMAT_NV12;
        desc.Width = surfaceWidth;
        desc.Height = surfaceHeight;
        desc.MipLevels = 1;
        desc.ArraySize = 1;
        desc.SampleDesc.Count = 1;
        desc.CPUAccessFlags = 0;
        desc.Usage = D3D11_USAGE_DEFAULT;
        desc.BindFlags = D3D11_BIND_RENDER_TARGET;

        m_LastErrorCode = m_Id3d11Device->CreateTexture2D(&desc, NULL, &pDstTexture);
    }

    if (m_CurrentFrameTexture && pDstTexture) {

        // Copy diff area texels to new temp texture
        //m_Id3d11DeviceContext->CopySubresourceRegion(pNewTexture, D3D11CalcSubresource(0, 0, 1), 0, 0, 0, m_CurrentFrameTexture, 0, NULL);

        HRESULT hr = pColorConv->Convert(m_CurrentFrameTexture, pDstTexture);

        if (SUCCEEDED(hr)) { 

            CComPtr<IMFMediaBuffer> pMediaBuffer = nullptr;

            MFCreateDXGISurfaceBuffer(__uuidof(ID3D11Texture2D), pDstTexture, 0, FALSE, (IMFMediaBuffer**)&pMediaBuffer);

            if (pMediaBuffer) {

                CComPtr<IMF2DBuffer> p2DBuffer = NULL;
                DWORD length = 0;
                (((IMFMediaBuffer*)pMediaBuffer))->QueryInterface(__uuidof(IMF2DBuffer), reinterpret_cast<void**>(&p2DBuffer));
                p2DBuffer->GetContiguousLength(&length);
                (((IMFMediaBuffer*)pMediaBuffer))->SetCurrentLength(length);

                //MFCreateVideoSampleFromSurface(NULL, (IMFSample**)videoSample);
                MFCreateSample((IMFSample * *)videoSample);

                if (videoSample) {

                    (*((IMFSample **)videoSample))->AddBuffer((((IMFMediaBuffer*)pMediaBuffer)));
                }

                return true;
            }
        }
    }
}

return false;
}

机器中的英特尔图形驱动程序已经是最新的。

只有 TransformNeedInput 事件一直被触发,但编码器抱怨它无法接受更多输入。 TransformHaveOutput 事件从未被触发。

intel 和 msdn 论坛上报告的类似问题: 1) https://software.intel.com/en-us/forums/intel-media-sdk/topic/607189 2) https://social.msdn.microsoft.com/Forums/SECURITY/en-US/fe051dd5-b522-4e4b-9cbb-2c06a5450e40/imfsinkwriter-merit-validation-failed-for-mft-intel-quick-sync-video-h264-encoder-mft?forum=mediafoundationdevelopment

更新: 我试图只模拟输入源(通过以编程方式创建一个动画矩形 NV12 示例),让其他一切保持不变。这一次,英特尔编码器没有任何抱怨,我什至得到了输出样本。除了英特尔编码器的输出视频失真,而 Nvidia 编码器工作得很好。

此外,对于带有英特尔编码器的原始 NV12 源,我仍然遇到 ProcessInput 错误。我对 Nvidia MFT 和软件编码器没有任何问题。

Intel硬件MFT的输出:(请看Nvidia编码器的输出)

Nvidia 硬件 MFT 的输出:

Nvidia 显卡使用情况统计:

Intel显卡使用统计(我不明白为什么GPU引擎显示为视频解码):

我看了你的代码。

根据您的 post,我怀疑是 Intel 视频处理器问题。

我的 OS 是 Win7,所以我决定在我的 Nvidia 卡上使用 D3D9Device 测试视频处理器行为,然后在 Intel HD Graphics 4000 上测试。

我想视频处理器功能对于 D3D9Device 和 D3D11Device 的行为方式相同。当然要检查一下。

所以我制作了这个程序来检查:https://github.com/mofo7777/DirectXVideoScreen(参见 D3D9VideoProcessor 子项目)

您似乎没有检查有关视频处理器功能的足够信息。

对于 IDXVAHD_Device::GetVideoProcessorDeviceCaps,这是我检查的内容:

DXVAHD_VPDEVCAPS.MaxInputStreams > 0

DXVAHD_VPDEVCAPS.VideoProcessorCount > 0

DXVAHD_VPDEVCAPS.OutputFormatCount > 0

DXVAHD_VPDEVCAPS.InputFormatCount > 0

DXVAHD_VPDEVCAPS.InputPool == D3DPOOL_DEFAULT

我还检查了 IDXVAHD_Device::GetVideoProcessorOutputFormats 和 IDXVAHD_Device::GetVideoProcessorInputFormats 支持的输入和输出格式。

这是我发现 Nvidia GPU 和 Intel GPU 之间的区别的地方。

NVIDIA : 4 输出格式

  • D3DFMT_A8R8G8B8
  • D3DFMT_X8R8G8B8
  • D3DFMT_YUY2
  • D3DFMT_NV12

INTEL : 3 输出格式

  • D3DFMT_A8R8G8B8
  • D3DFMT_X8R8G8B8
  • D3DFMT_YUY2

Intel HD Graphics 4000 不支持 NV12 输出格式。

此外,为了使程序正常工作,我需要在使用 VideoProcessBltHD 之前设置流状态:

  • DXVAHD_STREAM_STATE_D3DFORMAT
  • DXVAHD_STREAM_STATE_FRAME_FORMAT
  • DXVAHD_STREAM_STATE_INPUT_COLOR_SPACE
  • DXVAHD_STREAM_STATE_SOURCE_RECT
  • DXVAHD_STREAM_STATE_DESTINATION_RECT

对于 D3D11 :

ID3D11VideoProcessorEnumerator::GetVideoProcessorCaps == IDXVAHD_Device::GetVideoProcessorDeviceCaps

(D3D11_VIDEO_PROCESSOR_FORMAT_SUPPORT_OUTPUT) ID3D11VideoProcessorEnumerator::CheckVideoProcessorFormat == IDXVAHD_Device::GetVideoProcessorOutputFormats

(D3D11_VIDEO_PROCESSOR_FORMAT_SUPPORT_INPUT) ID3D11VideoProcessorEnumerator::CheckVideoProcessorFormat == IDXVAHD_Device::GetVideoProcessorInputFormats

ID3D11VideoContext::(...) == IDXVAHD_VideoProcessor::SetVideoProcessStreamState

您能否先验证一下您的 GPU 的视频处理器能力。你看到和我看到的一样的区别吗?

这是我们需要知道的第一件事,从我在您的 github 项目中看到的情况来看,您的程序似乎没有检查这一点。

如 post 中所述,Transform 的事件生成器在英特尔硬件上提供第一个输入样本后立即返回错误 MEError ("Unspecified error"),并且,进一步的调用刚刚返回 "Transform Need more input",但没有产生输出。不过,相同的代码在 Nvidia 机器上运行良好。经过大量试验和研究,我发现我创建了太多 D3d11Device 实例,在我的例子中,我分别创建了 2 到 3 个设备用于捕获、颜色转换和硬件编码器。然而,我可以简单地重复使用单个 D3dDevice 实例。创建多个 D3d11Device 实例可能适用于高端机器。这在任何地方都没有记录。我什至找不到 "MEError" 错误原因的线索。它没有被提及。 Whosebug 上有很多类似的帖子没有得到答复,即使是 Microsoft 的人也无法指出问题,给出了完整的源代码。

重用 D3D11Device 实例解决了问题。发布此解决方案,因为它可能对面临与我相同问题的人有所帮助。