活锁和抑制异步

Livelock and suppressed asynchrony

遇到与异步有关的有趣的活锁情况。

考虑下面导致 livelock 并执行 1 分钟的代码,即使有用的负载几乎不需要 运行。执行时间大约为 1 分钟的原因是我们实际上将达到线程池增长限制(大约每秒 1 个线程),因此 300 次迭代将使其 运行 持续大约 5 分钟。

不是 琐碎的死锁,我们在 SyncronizationContext 允许仅在单个线程上调度作业(例如 WPF、WebAPI 等)的环境中同步等待异步操作).下面的代码重现了控制台应用程序上的一个问题,其中没有明确的 SynchronizationContext 集并且任务正在线程池上安排。

我知道 "solution" 这个问题是“asynchrony all the way". In the real word we might not know that somewhere deep inside the developer of SyncMethod suppresses asynchrony via waiting it in a blocking way unleashing such issues (even if he might did the trick 替换 SynchronizationContext 以使其至少 不会死锁 ).

当 "asynchrony all the way" 不是一个选项时,您对处理此类问题有何建议?有没有其他东西而不是显而易见的 "do not spawn so many tasks at once"?

void Main()
{
    List<Task> tasks = new List<Task>();

    for (int i = 0; i < 60; i++)
        tasks.Add(Task.Run(() => SyncMethod()));

    bool exit = false;

    Task.WhenAll(tasks.ToArray()).ContinueWith(t => exit = true);

    while (!exit)
    {
        Print($"Thread count: {Process.GetCurrentProcess().Threads.Count}");
        Thread.Sleep(1000);
    }
}

void SyncMethod()
{
    SomethingAsync().Wait();
}

async Task SomethingAsync()
{
    await Task.Delay(1);
    await Task.Delay(1); // extra puzzle -- why commenting one of these Delay will partially resolve the issue?

    Print("async done");
}

void Print(object obj)
{
    $"[{Thread.CurrentThread.ManagedThreadId}] {DateTime.Now} - {obj}".Dump();
}

这是一个输出。请注意所有异步延续是如何卡住将近一分钟然后突然继续执行的。

[12] 30.01.2018 23:34:36 - Thread count: 18 
[12] 30.01.2018 23:34:37 - Thread count: 32
[12] 30.01.2018 23:34:38 - Thread count: 33 -- THREAD POOL STARTS TO GROW
...
[12] 30.01.2018 23:35:18 - Thread count: 70
[12] 30.01.2018 23:35:19 - Thread count: 71
[12] 30.01.2018 23:35:20 - Thread count: 72 -- UNTIL ALL SCHEDULED TASKS CAN FIT
[8] 30.01.2018 23:35:20 - async done -- ALMOST A MINUTE AFTER START
[8] 30.01.2018 23:35:20 - async done -- THE CONTINUATIONS START GO THROUGH
...
[61] 30.01.2018 23:35:20 - async done
[10] 30.01.2018 23:35:20 - async done

回答原问题:

What are your suggestions to deal with such an issue when "asynchrony all the way" is not an option? Is there something else rather than obvious "do not spawn so many tasks at once"?

绝不是根本原因的解决方案,而是 定量补救措施 - 我们可以使用 SetMinThreads 调整线程池,增加将创建的线程数量没有延迟(这样比我设置的常规 "injection rate" 每秒 1 个线程池线程更快)。它在给定设置中的工作方式很简单。基本上,我们一直在浪费线程池线程,直到线程池增长到足以开始执行延续。如果我们从足够大的池开始,我们基本上消除了我们刚刚受人为 "injection rate" 限制的时间段,它试图保持较低的线程数量(这是有道理的,因为线程池被设计为 运行 CPU-绑定任务而不是被阻塞等待异步操作)。

我还应该留下一个警告说明

By default, the minimum number of threads is set to the number of processors on a system. You can use the SetMinThreads method to increase the minimum number of threads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases, the thread pool will perform better with its own algorithm for allocating threads. Reducing the minimum to less than the number of processors can also hurt performance.

https://docs.microsoft.com/en-us/dotnet/api/system.threading.threadpool.setminthreads?view=netframework-4.8

还有一个有趣的问题,微软建议为 ASP.NET 增加 "min threads" 作为某些情况下的 performance/reliability 改进。

https://support.microsoft.com/en-us/help/821268/contention-poor-performance-and-deadlocks-when-you-make-calls-to-web-s

有意思的是,题中描述的问题并非纯属虚构。是真的。它发生在知名和广泛认可的软件上。来自经验的例子——Identity Server 3.

https://github.com/IdentityServer/IdentityServer3.EntityFramework/issues/101

具有此警告的实现(我们必须重写它以解决我们生产场景中的问题):

https://github.com/IdentityServer/IdentityServer3.EntityFramework/blob/master/Source/Core.EntityFramework/Serialization/ClientConverter.cs

另一篇文章详细解释了这个问题。

https://blogs.msdn.microsoft.com/vancem/2018/10/16/diagnosing-net-core-threadpool-starvation-with-perfview-why-my-service-is-not-saturating-all-cores-or-seems-to-stall/

至于单个 Task.Delay 的奇怪行为,其中一些异步调用是通过每个新注入的线程池线程完成的。这似乎是由连续执行内联以及 Task.DelayTimer 的实现方式引起的。查看此调用堆栈,它表明新创建的线程池线程在创建时正在为 .NET 计时器执行一些额外的操作,然后再处理线程池队列(请参阅 System.Threading.TimerQueue.AppDomainTimerCallback)。

   at AsynchronySamples.StrangeTimer.Program.d__2.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.InvokeMoveNext(Object stateMachine)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.c__DisplayClass4_0.b__0()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.ContinuationWrapper.Invoke()
   at System.Runtime.CompilerServices.TaskAwaiter.c__DisplayClass11_0.b__0()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.ContinuationWrapper.Invoke()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task.FinishStageThree()
   at System.Threading.Tasks.Task`1.TrySetResult(TResult result)
   at System.Threading.Tasks.Task.DelayPromise.Complete()
   at System.Threading.Tasks.Task.c.b__274_1(Object state)
   at System.Threading.TimerQueueTimer.CallCallbackInContext(Object state)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.TimerQueueTimer.CallCallback()
   at System.Threading.TimerQueueTimer.Fire()
   at System.Threading.TimerQueue.FireNextTimers()
   at System.Threading.TimerQueue.AppDomainTimerCallback(Int32 id)
   [Native to Managed Transition]   
   at kernel32.dll!74e86359()
   at kernel32.dll![Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]
   at ntdll.dll!77057b74()
   at ntdll.dll!77057b44()