Xamarin:OpenGL 视频编码锁定了 46000 个杰出的 GREF。执行完整的 GC

Xamarin: OpenGL Video Encoding Locks Up on 46000 Outstanding GREFs. Performing a full GC

我在 Xamarin.Android 应用中有一个很长的 运行ning 视频任务;它使用 MediaPlayer 将视频文件解码为自定义 OpenGL ES Surface,然后将另一个 Surface 排队,使用 MediaCodec 对数据进行编码,然后流入 ByteBuffer ,然后根据编码器输出 ByteBuffer 可用性将其传递到 MediaMuxer。该操作运行良好且速度很快,直到写入的视频文件字节总数超过 ~1.3GB,此时视频(但不是音频)将锁定。

该应用程序似乎有太多 GREF,因为我实时观察它们的上升和下降,直到它们最终远高于 46000 GREF。操作系统(或应用程序?)似乎无法通过 GC 转储所有 GREF,这会导致应用程序卡在视频处理过程中。我正在监视 android 资源,总可用内存在 OS 级别上从未发生太大变化; cpu 似乎也总是有足够的空闲余量 (~28%)。

我正在输出到系统控制台并使用 gref 输出日志记录:

adb shell setprop debug.mono.log gref

大约 14 分钟后垃圾回收似乎跟不上。 GREF 计数上升,然后下降,上升,然后下降;最终,它变得如此之高以至于 GREF 计数保持在 46k 以上,并循环显示以下消息:

09-26 15:07:11.613 I/monodroid-gc(11213): 46111 outstanding GREFs. Performing a full GC!
09-26 15:07:11.898 I/zygote64(11213): Explicit concurrent copying GC freed 9(32KB) AllocSpace objects, 0(0B) LOS objects, 70% free, 2MB/8MB, paused 434us total 63.282ms
09-26 15:07:13.470 D/Mono    (11213): GC_TAR_BRIDGE bridges 22974 objects 23013 opaque 1 colors 22974 colors-bridged 22974 colors-visible 22974 xref 1 cache-hit 0 cache-semihit 0 cache-miss 0 setup 3.40ms tarjan 25.53ms scc-setup 14.85ms gather-xref 1.76ms xref-setup 0.50ms cleanup 13.81ms
09-26 15:07:13.470 D/Mono    (11213): GC_BRIDGE: Complete, was running for 1798.94ms
09-26 15:07:13.470 D/Mono    (11213): GC_MAJOR: (user request) time 54.95ms, stw 57.82ms los size: 5120K in use: 1354K
09-26 15:07:13.470 D/Mono    (11213): GC_MAJOR_SWEEP: major size: 7648K in use: 6120K

GREF 日志看起来像这样...除了乘以数万。我可以看到这个数字上升,然后下降,上升然后下降,上升然后下降,直到这个数字很大,最终远远超过 46k,应用程序(或 OS?)似乎放弃了尝试清除这些 GREF grefc 38182 gwrefc 5 38182 是大幅上升然后下降直到远超过 46k 的数字

09-30 22:42:11.013 I/monodroid-gref(20765): -g- grefc 38182 gwrefc 51 handle 0x98156/G from thread 'finalizer'(25420)
09-30 22:42:11.013 I/monodroid-gref(20765): +w+ grefc 38181 gwrefc 52 obj-handle 0x980f6/G -> new-handle 0xbc3/W from thread 'finalizer'(25420)
09-30 22:42:11.013 I/monodroid-gref(20765): -g- grefc 38181 gwrefc 52 handle 0x980f6/G from thread 'finalizer'(25420)

GC 系统也有这些警告 warning: not replacing previous registered handle 0x30192 with handle 0x62426 for key_handle 0x9b1ac32

10-03 13:15:25.453 I/monodroid-gref(22127): +g+ grefc 24438 gwrefc 0 obj-handle 0x9/I -> new-handle 0x62416/G from thread 'Thread Pool Worker'(44)
10-03 13:15:25.476 I/monodroid-gref(22127): +g+ grefc 24439 gwrefc 0 obj-handle 0x30192/I -> new-handle 0x62426/G from thread 'Thread Pool Worker'(44)
10-03 13:15:25.477 I/monodroid-gref(22127): warning: not replacing previous registered handle 0x30192 with handle 0x62426 for key_handle 0x9b1ac32
10-03 13:15:25.483 I/monodroid-gref(22127): +g+ grefc 24440 gwrefc 0 obj-handle 0x9/I -> new-handle 0x62436/G from thread 'Thread Pool Worker'(44)

此外,当垃圾收集 运行ning 时视频似乎冻结了,即使发生这种情况时没有陷入循环。这是我正在寻找提示或答案的另一个问题。

此代码是从另一个项目移植而来的;我注意到之前的开发者提到了

    // Even if we don't access the SurfaceTexture after the constructor returns, we
    // still need to keep a reference to it. The Surface doesn't retain a reference
    // at the Java level, so if we don't either then the object can get GCed, which
    // causes the native finalizer to run.

我认为这是我遇到的问题的关键,但我感到困惑的是,如果垃圾收集无法进行,应用程序应该如何继续编码 运行 .我在 GREF 日志中看到了大量这样的内容:

10-03 13:07:04.897 I/monodroid-gref(22127): +g+ grefc 6472 gwrefc 4825 obj-handle 0x3727/W -> new-handle 0x2982a/G from thread 'finalizer'(24109)

那么这个 GREF 日志条目是否表明我需要完成终结器?或者它是否表明我不应该允许终结器 运行 视频完成编码之前?

我阅读了一些相关内容并查看了执行相同类型操作的 java 代码。那时我尝试将 WeakReference 添加到父 class。视频编码似乎与弱参考一起走得更远,但它最终还是锁定了。

private void setup() {
    _textureRender = new TextureRender();
    _textureRender.SurfaceCreated();
    // Even if we don't access the SurfaceTexture after the constructor returns, we
    // still need to keep a reference to it. The Surface doesn't retain a reference
    // at the Java level, so if we don't either then the object can get GCed, which
    // causes the native finalizer to run.
    _surfaceTexture = new SurfaceTexture(_textureRender.TextureId);
    Parent.WeakSurfaceTexture.FrameAvailable += FrameAvailable; // notice the Weak references here
    _surface = new Surface(Parent.WeakSurfaceTexture);
}

以下是我获取弱父引用的方式:

    public System.WeakReference weakParent;
    private OutputSurface Parent  {
        get {
            if (weakParent == null || !weakParent.IsAlive)
                return null;
            return weakParent.Target as OutputSurface;
        }
    }

    public SurfaceTexture WeakSurfaceTexture {
        get { return Parent.SurfaceTexture; }
    }

当应用程序实际锁定在 GC 循环中时,它会卡在这个

var curDisplay = EGLContext.EGL.JavaCast<IEGL10>().EglGetCurrentDisplay();

在这种情况下:

    const int TIMEOUT_MS = 20000;
    public bool AwaitNewImage(bool returnOnFailure = false) {
        System.Threading.Monitor.Enter (_frameSyncObject);
        while (!IsFrameAvailable) {
            try {
                // Wait for onFrameAvailable() to signal us.  Use a timeout to avoid
                // stalling the test if it doesn't arrive.
                System.Threading.Monitor.Wait (_frameSyncObject, TIMEOUT_MS);

                if (!IsFrameAvailable) {
                    if (returnOnFailure) {
                        return false;
                    }
                    // TODO: if "spurious wakeup", continue while loop
                    //throw new RuntimeException ("frame wait timed out");
                }
            } catch (InterruptedException ie) {
                if (returnOnFailure) {
                    return false;
                }
                // shouldn't happen
                //throw new RuntimeException (ie);
            } catch (Exception ex) { throw ex; }
        }
        IsFrameAvailable = false;
        System.Threading.Monitor.Exit (_frameSyncObject);
        //the app is locking up on the next line:
        var curDisplay = EGLContext.EGL.JavaCast<IEGL10>().EglGetCurrentDisplay();
        _textureRender.CheckGlError ("before updateTexImage");
        Parent.WeakSurfaceTexture.UpdateTexImage ();
        return true;
    }

所以这是我需要阻止终结器 运行ning 的问题吗?还是终结器导致太多 GREF 的问题?在继续处理视频之前,我是否需要处理其中一些帧渲染 SurfaceTexture?在继续 read/write 过程之前,我是否需要暂停 MediaPlayer 并转储所有这些引用?

我是否需要以某种方式优化我的代码?我读到如果有太多 Java.Lang.Object 实例化或用法,它可能会导致 GREF 溢出(或类似的东西?)。我梳理了我的代码,找不到任何从 Java.Lang.Object 继承的东西,在这个循环中得到 运行。

还是我跑题了,还有别的原因?

我基本上只是想弄清楚如何解决 GC 循环期间的视频编码器锁定问题。任何要寻找的指针或东西将不胜感激。我还注意到垃圾回收(当它发生时)似乎导致框架短暂地卡顿,所以这也是我试图解决的问题。

这是完整的代码库:

https://github.com/hexag0d/BitChute_Mobile_Android_BottomNav/blob/VideoPreProcessing_/PtOffsetRedux/VideoEncoding/OutputSurface.cs

请指教

编辑:我刚刚注意到我发布的分支继承自 OutputSurface class 上的 Java.Lang.Object。我删除了它并再次推动了分支。我有一堆分支试图让它工作,我已经回溯到一个仍然继承自这个 class 的分支。我知道在之前的许多尝试中,我已经从项目中删除了所有 Java.Lang.Object 继承,但它仍然锁定在 GC 上。

更新:当我 运行 上面分支中的代码时,我没有看到 GREF 超过 46k,但是视频似乎仍然锁定在垃圾回收上;只是现在视频处理实际上将完成,并且 GREF 计数仍然真的接近 46k。我认为对于一个非常长的视频,它仍然会超过 46k,因为随着视频处理的深入,计数不断增加。

事实证明,我所要做的就是注释掉我提到的可疑行:

var curDisplay = EGLContext.EGL.JavaCast<IEGL10>().EglGetCurrentDisplay();

它 运行 在一个循环中被调用数千次才能完成一个完整的视频。

肯定发生的是这些 EGLDisplay 个实例 (var) 没有被正确地垃圾收集。我原以为方法完成后它们会自动收集,但有些事情阻止了这种情况的发生。如果您对此了解更多,请随时给出更好的答案;我不确定是什么导致 finalizer 挂在这些对象上。

单凭这一点对解决这类问题并没有多大帮助,所以我是这样想的:

首先我将此代码添加到 MainActivity OnCreate .. 这会将 GREF 日志写入机器人设备根目录下的 /download 文件夹中的文件,然后每 120 次循环并更新一次秒(或您选择的任何时间间隔)

#if DEBUG
            Task.Run(async () =>
            {
                const int seconds = 120;
                const string grefTag = "monodroid-gref";
                const string grefsFile = "grefs.txt";
                while (true)
                {
                    var appDir = Application.ApplicationInfo.DataDir;
                    var grefFile = System.IO.Path.Combine("/data/data", PackageName, "files/.__override__", grefsFile);
                    var grefFilePublic = System.IO.Path.Combine(Android.OS.Environment.ExternalStorageDirectory + Java.IO.File.Separator + "download", grefsFile);
                    if (System.IO.File.Exists(grefFile))
                    {
                        System.IO.File.Copy(grefFile, grefFilePublic, true);
                        System.Console.Write(grefTag, $"adb pull {grefFilePublic} {grefsFile}");
                    }
                    else
                        System.Console.Write(grefTag, "no grefs.txt found, gref logging enabled? (adb shell setprop debug.mono.log gref)");
                    await Task.Delay(seconds * 1000);
                }
            });

#endif

然后运行此命令启用设备上的 gref 日志记录

adb shell setprop debug.mono.log gref

然后我 运行 应用程序并让视频处理器运行并最终陷入困境。在此之后,我从下载文件夹中收集了 .txt 文件并使用 Visual Studio 代码进行检查(因为它可以轻松处理大文件)

在我的例子中,有一个循环 运行 一遍又一遍,看起来像这样,重复了数千次:

take_weak_global_ref_jni
-g- grefc 25196 gwrefc 7 handle 0x6495a/G from thread 'finalizer'(27691)
take_weak_global_ref_jni
*take_weak obj=0x7c1046df60; handle=0x64106
+w+ grefc 25195 gwrefc 8 obj-handle 0x64106/G -> new-handle 0x953/W from thread 'finalizer'(27691)
take_weak_global_ref_jni
-g- grefc 25195 gwrefc 8 handle 0x64106/G from thread 'finalizer'(27691)
take_weak_global_ref_jni
*take_weak obj=0x7c19c4e630; handle=0x64fd6
+w+ grefc 25194 gwrefc 9 obj-handle 0x64fd6/G -> new-handle 0x963/W from thread 'finalizer'(27691)
take_weak_global_ref_jni
-g- grefc 25194 gwrefc 9 handle 0x64fd6/G from thread 'finalizer'(27691)
take_weak_global_ref_jni
*take_weak obj=0x7c1046df98; handle=0x63d9a
+w+ grefc 25193 gwrefc 10 obj-handle 0x63d9a/G -> new-handle 0x973/W from thread 'finalizer'(27691)
take_weak_global_ref_jni
-g- grefc 25193 gwrefc 10 handle 0x63d9a/G from thread 'finalizer'(27691)
take_weak_global_ref_jni

我认为这些是卡住无法收集的内存地址。注意 handle=0x64fd6

所以我在 .txt 文件中搜索了那个地址,结果是:

+g+ grefc 25190 gwrefc 0 obj-handle 0x4eaba/I -> new-handle 0x64fd6/G from thread 'Thread Pool Worker'(8)
  at Android.Runtime.AndroidObjectReferenceManager.CreateGlobalReference (Java.Interop.JniObjectReference value) [0x00000] in <016ee5efc3d0460baf9a60f95885ebbb>:0 
  at Java.Interop.JniObjectReference.NewGlobalRef () [0x00000] in <286213b9e14c442ba8d8d94cc9dbec8e>:0 
  at Android.Runtime.JNIEnv.NewGlobalRef (System.IntPtr jobject) [0x00000] in <016ee5efc3d0460baf9a60f95885ebbb>:0 
  at Java.Lang.Object.RegisterInstance (Android.Runtime.IJavaObject instance, System.IntPtr value, Android.Runtime.JniHandleOwnership transfer, System.IntPtr& handle) [0x00000] in <016ee5efc3d0460baf9a60f95885ebbb>:0 
  at Java.Lang.Object.SetHandle (System.IntPtr value, Android.Runtime.JniHandleOwnership transfer) [0x00000] in <016ee5efc3d0460baf9a60f95885ebbb>:0 
  at Java.Lang.Object..ctor (System.IntPtr handle, Android.Runtime.JniHandleOwnership transfer) [0x00000] in <016ee5efc3d0460baf9a60f95885ebbb>:0 
  at Javax.Microedition.Khronos.Egl.IEGL10Invoker..ctor (System.IntPtr handle, Android.Runtime.JniHandleOwnership transfer) [0x00000] in <016ee5efc3d0460baf9a60f95885ebbb>:0 
  at System.Reflection.MonoCMethod.InternalInvoke (System.Reflection.MonoCMethod , System.Object , System.Object[] , System.Exception& ) [0x00000] in <d4a23bbd2f544c30a48c44dd622ce09f>:0 
  at System.Reflection.MonoCMethod.InternalInvoke (System.Object obj, System.Object[] parameters, System.Boolean wrapExceptions) [0x00000] in <d4a23bbd2f544c30a48c44dd622ce09f>:0 
  at System.Reflection.MonoCMethod.DoInvoke (System.Object obj, System.Reflection.BindingFlags invokeAttr, System.Reflection.Binder binder, System.Object[] parameters, System.Globalization.CultureInfo culture) [0x00000] in <d4a23bbd2f544c30a48c44dd622ce09f>:0 
  at System.Reflection.MonoCMethod.Invoke (System.Reflection.BindingFlags invokeAttr, System.Reflection.Binder binder, System.Object[] parameters, System.Globalization.CultureInfo culture) [0x00000] in <d4a23bbd2f544c30a48c44dd622ce09f>:0 
  at System.Reflection.ConstructorInfo.Invoke (System.Object[] parameters) [0x00000] in <d4a23bbd2f544c30a48c44dd622ce09f>:0 
  at Java.Interop.TypeManager.CreateProxy (System.Type type, System.IntPtr handle, Android.Runtime.JniHandleOwnership transfer) [0x00000] in <016ee5efc3d0460baf9a60f95885ebbb>:0 
  at Java.Interop.TypeManager.CreateInstance (System.IntPtr handle, Android.Runtime.JniHandleOwnership transfer, System.Type targetType) [0x00000] in <016ee5efc3d0460baf9a60f95885ebbb>:0 
  at Java.Lang.Object.GetObject (System.IntPtr handle, Android.Runtime.JniHandleOwnership transfer, System.Type type) [0x00000] in <016ee5efc3d0460baf9a60f95885ebbb>:0 
  at Java.Interop.JavaObjectExtensions._JavaCast[TResult] (Android.Runtime.IJavaObject instance) [0x00000] in <016ee5efc3d0460baf9a60f95885ebbb>:0 
  at Java.Interop.JavaObjectExtensions.JavaCast[TResult] (Android.Runtime.IJavaObject instance) [0x00000] in <016ee5efc3d0460baf9a60f95885ebbb>:0 
  at Android.Runtime.Extensions.JavaCast[TResult] (Android.Runtime.IJavaObject instance) [0x00000] in <016ee5efc3d0460baf9a60f95885ebbb>:0 
  at MediaCodecHelper.OutputSurface.AwaitNewImage (System.Boolean returnOnFailure) [0x00079] in C:\repos\BitChute_Mobile_Android_BottomNav_newAppBak - Copy - Copy\VideoEncoding\OutputSurface.cs:298 
  at MediaCodecHelper.FileToMp4.EncodeFileToMp4 (System.String inputPath, System.String outputPath, System.Boolean encodeAudio, Android.Net.Uri inputUri) [0x00202] in C:\repos\BitChute_Mobile_Android_BottomNav_newAppBak - Copy - Copy\VideoEncoding\FileToMp4.cs:253 
  at MediaCodecHelper.FileToMp4.Start (Android.Net.Uri inputUri, System.String outputPath, System.String inputPath) [0x00007] in C:\repos\BitChute_Mobile_Android_BottomNav_newAppBak - Copy - Copy\VideoEncoding\FileToMp4.cs:181 
  at BitChute.Fragments.SettingsFrag+<>c.<StartEncoderTest>b__18_0 () [0x000fe] in C:\repos\BitChute_Mobile_Android_BottomNav_newAppBak - Copy - Copy\Fragments\SettingsFrag.cs:310 
  at System.Threading.Tasks.Task.InnerInvoke () [0x00000] in <d4a23bbd2f544c30a48c44dd622ce09f>:0 
  at System.Threading.Tasks.Task.Execute () [0x00000] in <d4a23bbd2f544c30a48c44dd622ce09f>:0 
  at System.Threading.Tasks.Task.ExecutionContextCallback (System.Object obj) [0x00000] in <d4a23bbd2f544c30a48c44dd622ce09f>:0 
  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x00000] in <d4a23bbd2f544c30a48c44dd622ce09f>:0 
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x00000] in <d4a23bbd2f544c30a48c44dd622ce09f>:0 
  at System.Threading.Tasks.Task.ExecuteWithThreadLocal (System.Threading.Tasks.Task& currentTaskSlot) [0x00000] in <d4a23bbd2f544c30a48c44dd622ce09f>:0 
  at System.Threading.Tasks.Task.ExecuteEntry (System.Boolean bPreventDoubleExecution) [0x00000] in <d4a23bbd2f544c30a48c44dd622ce09f>:0 
  at System.Threading.Tasks.Task.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem () [0x00000] in <d4a23bbd2f544c30a48c44dd622ce09f>:0 
  at System.Threading.ThreadPoolWorkQueue.Dispatch () [0x00000] in <d4a23bbd2f544c30a48c44dd622ce09f>:0 
  at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback () [0x00000] in <d4a23bbd2f544c30a48c44dd622ce09f>:0 
handle 0x64fd6; key_handle 0x259aa25: Java Type: `com/google/android/gles_jni/EGLImpl`; MCW type: `Javax.Microedition.Khronos.Egl.IEGL10Invoker`

注意 Javax.Microedition.Khronos.Egl.IEGL10Invokerhandle 0x64fd6

我建议检查 GC 卡在循环中的部分的内存句柄;然后搜索这些句柄并查看是否可以找到它们所引用的类型。在找到循环尝试 GC 时卡住的类型之后,您将需要回溯源代码并找到在循环中调用(或实例化)此类型的位置。我认为通常(根据我的阅读)循环会生成未处理的 Java.Lang.Object 引用,导致 GC 失败。

所以我当时就知道它与界面有关IEGL10。我回去尝试从循环中删除并且它起作用了! GREF 现在永远不会超过 600。

快速步骤:

1. enable gref logging
2. run app 
3. check logs for the memory addresses that are not being collected properly (where your app gets stuck in a GC loop, you'll likely see
a ton of repeated lines)
4. search for those memory addresses
5. check for the object **type** that is in the memory handle assignment stack trace
6. go back to your long running problematic loops and see if you can find a matching method being called or an object instantiation in rapid succession and not
being disposed of
7. either `Dispose` your looped objects manually or I also read to try and avoid `Java.Lang.Object` inheritance if it's in a long
running loop.  

你可能不会像我一样幸运,我可以只注释掉一行代码。您可能必须想出一种方法来手动处理循环对象或做其他事情来通知应用程序它可以安全地 GC 这些对象,但是内存地址应该可以让您知道哪个对象导致了问题GC.

如果这不是最好的答案,我很抱歉,但我对 GC 的工作原理不是很熟悉。如果有人能给出更好的解释,我很想听听更多细节,但这就是我修复的方式!希望对你有帮助