IIS 工作进程中有很多挂起的请求

A lot of hung requests in the IIS Worker Process

托管在 IIS 中的 Web API 在其中一个版本之后出现意外行为:

这些是使用 IIS Web Service(_Total) Current Connections 计数器收集的指标:

A connection is a active session on your server. When someone connects, it increments the counter, when they disconnect, it goes down.

在什么时候一切都稳定运行,但是连接数超过了 5,000+- 连接阈值并且 API 给出了 503.2 错误。

来自 API:

The serverRuntime@appConcurrentRequestLimit setting is being exceeded.

来自事件日志:

A worker process '5352' serving application pool '%placeholder%' failed to stop a listener channel for protocol 'http' in the allotted time. The data field contains the error number.

重启可以解决这个问题,但不是长久之计

我有应用程序池的完整转储,但我没有发现任何问题:死锁或类似 that.I 的东西没有太多经验,所以我使用了如下命令:!crlstack!dupmstack!dkl!runaway!threadpool等进行分析。

没有很多繁忙的工作人员或完成线程,一切看起来都很好,甚至 CPU 利用率和相应的指标。我也尝试使用 Debug Diagnostic Tool 进行分析,但无济于事,它显示的内容几乎与我使用 WinDbg.

时看到的相同

可能导致问题的原因是什么?如何比较症状?使用什么工具或者尝试对已经提到的工具执行一些单独的操作?我知道问题出在应用程序中,但我无法诊断它。

一头雾水,不知道进一步分析的方向。如果我的问题与本网站无关,我可以移动它。


UPD.

Especially if you are using HttpClient. check this one here. You might even be exhausting ports.

我知道这样的问题,但这里是另外一种情况。出现问题的组件是调用链中的最后一个:

Client <-> API <-> DB


From your version control, find the changed calls to limit the investigation area and find where the new problem stems from.

已经,可惜没有成功。我可以简单地逐个任务回滚,但我认为这不是一个好的解决方案,因为我需要对此类问题的诊断有一个想法。


托管线程列表:

0:000> !runaway
 User Mode Time
  Thread       Time
   48:ea8      0 days 0:00:25.187
   50:18ec     0 days 0:00:23.171
   49:1b1c     0 days 0:00:22.593
   52:5c4      0 days 0:00:22.562
   51:1bd8     0 days 0:00:22.312
   45:109c     0 days 0:00:22.187
   47:152c     0 days 0:00:22.078
   46:1988     0 days 0:00:20.859
   56:2b8      0 days 0:00:17.078
   26:1f40     0 days 0:00:16.281
   24:140c     0 days 0:00:16.265
   27:17c8     0 days 0:00:16.187
   22:181c     0 days 0:00:16.109
   25:1f88     0 days 0:00:16.031
   23:9fc      0 days 0:00:15.968
   20:10ec     0 days 0:00:15.765
   21:1f74     0 days 0:00:15.750
   57:ff8      0 days 0:00:12.390
    9:1ef4     0 days 0:00:04.734
   59:1b8c     0 days 0:00:04.375
    7:18b4     0 days 0:00:04.187
    6:1798     0 days 0:00:04.000
    4:1ac0     0 days 0:00:03.671
   10:13d4     0 days 0:00:03.484
    8:1f70     0 days 0:00:03.203
   55:434      0 days 0:00:03.171
   60:1e34     0 days 0:00:03.031
    5:1f0      0 days 0:00:02.468
   44:16d0     0 days 0:00:02.203
   61:bd4      0 days 0:00:02.156
   40:1c34     0 days 0:00:02.031
   43:177c     0 days 0:00:02.000
   38:1b5c     0 days 0:00:01.890
   36:2210     0 days 0:00:01.890
    3:2264     0 days 0:00:01.796
   39:1e5c     0 days 0:00:01.765
   34:1ea8     0 days 0:00:01.734
   62:16ac     0 days 0:00:01.718
   37:10a4     0 days 0:00:01.609
   42:2028     0 days 0:00:01.593
   35:10b4     0 days 0:00:01.515
   41:187c     0 days 0:00:01.453
   64:1764     0 days 0:00:00.703
   65:124c     0 days 0:00:00.593
   63:13a0     0 days 0:00:00.453
   58:1a9c     0 days 0:00:00.421
   70:14dc     0 days 0:00:00.406
   54:8d8      0 days 0:00:00.390
   69:704      0 days 0:00:00.265
   66:1d0c     0 days 0:00:00.156
   28:2120     0 days 0:00:00.140
   72:18c4     0 days 0:00:00.109
   73:b40      0 days 0:00:00.015
    0:1330     0 days 0:00:00.015
   77:50c      0 days 0:00:00.000
   76:1840     0 days 0:00:00.000
   75:1614     0 days 0:00:00.000
   74:1c1c     0 days 0:00:00.000
   71:824      0 days 0:00:00.000
   68:18e8     0 days 0:00:00.000
   67:1518     0 days 0:00:00.000
   53:1ed4     0 days 0:00:00.000
   33:1838     0 days 0:00:00.000
   32:1e6c     0 days 0:00:00.000
   31:1a40     0 days 0:00:00.000
   30:608      0 days 0:00:00.000
   29:2e0      0 days 0:00:00.000
   19:176c     0 days 0:00:00.000
   18:1fa0     0 days 0:00:00.000
   17:1394     0 days 0:00:00.000
   16:14f0     0 days 0:00:00.000
   15:13cc     0 days 0:00:00.000
   14:5d0      0 days 0:00:00.000
   13:944      0 days 0:00:00.000
   12:4f8      0 days 0:00:00.000
   11:2360     0 days 0:00:00.000
    2:310      0 days 0:00:00.000
    1:fe8      0 days 0:00:00.000

!threadpool 的输出:

0:000> !threadpool
CPU utilization: 14%
Worker Thread: Total: 11 Running: 0 Idle: 9 MaxLimit: 32767 MinLimit: 8
Work Request in Queue: 0
--------------------------------------
Number of Timers: 2
--------------------------------------
Completion Port Thread:Total: 7 Free: 7 MaxFree: 16 CurrentLimit: 7 MaxLimit: 1000 MinLimit: 8

他们都有Http Status: 200 (NULL). Not completed。不幸的是,没有与任何线程关联的记录。

0:000> !whttp /running
HttpContext    Thread Time Out Running  Status Verb
0000029e1953fb58   -- 00:01:50 00:14:22    200 GET       
0000029e19ce1b58   -- 00:01:50 00:18:38    200 GET       
0000029e19dabc00   -- 00:01:50 00:16:08    200 GET      
0000029e19db28c8   -- 00:01:50 00:14:15    200 GET       
0000029e19db9898   -- 00:01:50 00:10:51    200 GET       
0000029e19dc9cf8   -- 00:01:50 00:18:38    200 GET       
0000029e19de4188   -- 00:01:50 00:10:51    200 GET       
0000029e19e48350   -- 00:01:50 00:10:45    200 GET       
0000029e19ea3428   -- 00:01:50 00:18:05    200 GET       
0000029e19eaab88   -- 00:01:50 00:10:45    200 GET       
0000029e19ec91a0   -- 00:01:50 00:10:44    200 GET       
0000029e19f74e30   -- 00:01:50 00:10:39    200 GET       
0000029e19fa8ca8   -- 00:01:50 00:10:39    200 GET       
0000029e19fe56d8   -- 00:01:50 00:10:50    200 GET       
0000029e19ffa778   -- 00:01:50 00:14:21    200 GET       
0000029e1a0b6088   -- 00:01:50 00:10:38    200 GET       
0000029e1a12b040   -- 00:01:50 00:10:38    200 GET       
0000029e1a16cd50   -- 00:01:50 00:10:37    200 GET       
0000029e1a2b22e0   -- 00:01:50 00:19:18    200 GET       
0000029e1a2cf618   -- 00:01:50 00:19:27    200 GET       
0000029e1a2f3620   -- 00:01:50 00:19:18    200 GET       
0000029e1a2ff808   -- 00:01:50 00:19:18    200 GET       
0000029e1a30aa20   -- 00:01:50 00:19:22    200 GET       
0000029e1a314b98   -- 00:01:50 00:19:22    200 GET       
0000029e1a3352a0   -- 00:01:50 00:19:17    200 GET       
0000029e1a34e6f8   -- 00:01:50 00:18:38    200 GET       
0000029e1a353248   -- 00:01:50 00:19:10    200 GET       
0000029e1a371260   -- 00:01:50 00:19:20    200 GET       
0000029e1a39f800   -- 00:01:50 00:18:37    200 GET       
0000029e1a3b32e8   -- 00:01:50 00:18:36    200 GET       
0000029e1a3d18b8   -- 00:01:50 00:18:03    200 GET       
0000029e1a3d6f40   -- 00:01:50 00:18:22    200 GET       
0000029e1a3e2670   -- 00:01:50 00:19:14    200 GET       
0000029e1a404510   -- 00:01:50 00:18:53    200 GET       
0000029e1a413eb8   -- 00:01:50 00:18:38    200 GET       
0000029e1a416a48   -- 00:01:50 00:18:53    200 GET       
0000029e1a41c888   -- 00:01:50 00:19:13    200 GET       
0000029e1a4288c0   -- 00:01:50 00:19:13    200 GET       
0000029e1a442118   -- 00:01:50 00:19:13    200 GET       
0000029e1a48b098   -- 00:01:50 00:18:53    200 GET       
0000029e1a51edc0   -- 00:01:50 00:14:15    200 GET       
0000029e1a52a420   -- 00:01:50 00:18:04    200 GET       
0000029e1a55bb48   -- 00:01:50 00:19:12    200 GET      

在这种情况下,没有请求队列、高线程争用或类似情况,只有 IIS 个连接的黑洞和尖峰。


我注意到 GC 的代数在问题开始出现的那一刻正在增加。或许我分析一堆能有所收获

因为它是一个 post 发布的东西,我建议如下。

检查更新日志

从你的版本控制中,找到更改的调用以限制调查区域,并找到新问题的根源。

检查长 运行 请求

Check this answer here for tracking long running requests。很可能您有一些请求需要等待很长时间(或者有多个请求无法正常工作)。

检查消费者的变化

特别是如果您使用 HttpClientcheck this one here。您甚至可能会耗尽端口。

这个回答可以分为两部分:

  • [第 1 部分] 查找原因
  • [第 2 部分] 找出罪魁祸首

[第 1 部分]

注意到在问题开始出现的那一刻,生成的大小增加了。请参阅问题的屏幕截图。因此,我们可以尝试分析堆中是否存在异常,这将有助于我们找到问题的原因。

WinDbg command output has been shortened to fit the symbols limit.


拥有进程的完整转储,检索到问题发生时未执行的 HTTP 请求列表:

0:000> !aspxpagesext -n -nq
   Address          Completed Timeout Time (secs) ThreadId ReturnCode Verb
AS 00000284fcc095c8        no     110        4870                 200 GET 
AS 00000286000192a8        no     110        1858                 200 GET 
AS 00000286000242e0        no     110        1854                 200 GET 
AS 000002860002ff78        no     110        1851                 200 GET 
AS 00000286000433c8        no     110        1829                 200 GET 
AS 00000286000527d8        no     110        1824                 200 GET 
AS 000002860006caa0        no     110        1074                 200 GET 
AS 00000286004d26c0        no     110        1561                 200 GET 
AS 00000286005386f0        no     110        1557                 200 GET 
AS 00000286005d5090        no     110        1546                 200 GET 
AS 00000286005edb68        no     110        1381                 200 GET 
AS 0000028600603290        no     110        1539                 200 GET 
AS 0000028600604818        no     110        1539                 200 GET 
AS 000002860062b640        no     110        1537                 200 GET 
AS 0000028600636f10        no     110        1369                 200 GET 
AS 0000028600642a08        no     110        1536                 200 GET 
AS 000002860065bbc8        no     110        1346                 200 GET 
AS 0000028600666a30        no     110        1339                 200 GET 
AS 0000028600673748        no     110        1332                 200 GET 
AS 000002860067fb28        no     110        1327                 200 GET 
AS 000002860068e000        no     110        1322                 200 GET 
AS 00000286006a1450        no     110        1320                 200 GET 
AS 00000286006b0418        no     110        1320                 200 GET 
AS 00000286006bfc48        no     110        1318                 200 GET 
AS 00000286006cf7e8        no     110        1317                 200 GET 
AS 00000286006e2dd8        no     110        1311                 200 GET 
AS 00000286006ee748        no     110        1309                 200 GET 
AS 00000286006fb200        no     110        1308                 200 GET 
AS 000002860070c870        no     110        1288                 200 GET 
AS 000002880224d568        no     110          13                 200 GET 
3038 contexts found (2815 displayed).
   AS == possibly async request
   SL == session locked

没有一个请求有关联的线程,所以我们无法查看内部,但我们可以找到使请求对象免于垃圾回收的所有根源。


000002880224d568是服务器处理了13秒的最后HttpContext之一的地址。其他请求的分析显示类似的结果:

0:000> !gcroot 000002880224d568
Thread a38:
    000000a31747ed00 00007ffc85d931d3 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
        rbp-80: 000000a31747ed40
            ->  00000287fc9157e0 System.Threading.Thread
            ->  00000285fc8c8438 System.Runtime.Remoting.Contexts.Context
            ->  00000285fc8c8150 System.AppDomain
            ->  00000285fcc1f9d8 System.EventHandler
            ->  00000284fc9205c8 System.Object[]
            ->  00000285fc8d6ab8 System.EventHandler
            ->  00000285fc8d6530 System.Web.Hosting.HostingEnvironment
            ->  00000285fc8d66d8 System.Collections.Hashtable
            ->  00000285fc8d6728 System.Collections.Hashtable+bucket[]
            ->  00000285fc94fb08 System.Web.Hosting.ApplicationMonitors+AppMonitorRegisteredObject
            ->  00000285fc94d9b8 System.Web.Hosting.ApplicationMonitors
            ->  00000285fc94d9d0 System.Web.Hosting.AspNetMemoryMonitor
            ->  00000285fc94f040 System.Web.Hosting.LowPhysicalMemoryMonitor
            ->  00000285fc94f318 System.Threading.Timer
            ->  00000285fc94f390 System.Threading.TimerHolder
            ->  00000285fc94f338 System.Threading.TimerQueueTimer
            ->  00000285fc94d0c0 System.Threading.TimerQueueTimer
            ->  00000285fc8e47f8 System.Threading.TimerQueueTimer
            ->  00000285fc8e4710 System.Threading.TimerQueueTimer
            ->  00000285fc8e4628 System.Threading.TimerQueueTimer
            ->  00000285fc8e4540 System.Threading.TimerQueueTimer
            ->  00000285fc8e43f8 System.Threading.TimerQueueTimer
            ->  00000285fc8db808 System.Threading.TimerQueueTimer
            ->  00000285fc8db7a8 System.Threading.TimerCallback
            ->  00000285fc8db4e8 System.Web.RequestTimeoutManager
            ->  00000285fc8db520 System.Web.Util.DoubleLinkList[]
            ->  00000285fc8db690 System.Web.Util.DoubleLinkList
            ->  00000286fcaca7b0 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  00000286fcddae98 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  00000284fce820f8 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  00000287fcfd1ac8 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  00000287fd075278 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  0000028801784ed8 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  0000028700fb7f20 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  00000285fec96f00 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  000002860076f5f8 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  000002860162ef58 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  0000028601d98e90 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  000002850246e118 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  000002880224ef58 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  000002880224d568 System.Web.HttpContext

HandleTable:
    00000284fbca15e0 (pinned handle)
    -> 00000288fc893f30 System.Object[]
    -> 00000284fc911968 NamespaceClientApi.Models.Helpers.KeyedSemaphoreSlim
    -> 00000284fc9119a8 System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[NamespaceClientApi.Models.Helpers.KeyedSemaphoreSlim+SemaphoreWrapper, Project.Library.SomeClientApiModels]]
    -> 00000287ff97efc0 System.Collections.Generic.Dictionary`2+Entry[[System.String, mscorlib],[NamespaceClientApi.Models.Helpers.KeyedSemaphoreSlim+SemaphoreWrapper, Project.Library.SomeClientApiModels]][]
    -> 00000285fd3a4880 NamespaceClientApi.Models.Helpers.KeyedSemaphoreSlim+SemaphoreWrapper
    -> 00000285fd3a48b0 System.Threading.SemaphoreSlim
    -> 00000285fd3a8698 System.Threading.SemaphoreSlim+TaskNode
    -> 00000285fd3b5168 System.Threading.SemaphoreSlim+TaskNode
    -> 00000287fcc17908 System.Threading.SemaphoreSlim+TaskNode
    -> 00000284fce05cf0 System.Threading.SemaphoreSlim+TaskNode
    -> 00000287fcc37988 System.Threading.SemaphoreSlim+TaskNode
    -> 00000285fd3cd0e8 System.Threading.SemaphoreSlim+TaskNode
    -> 00000286fcdbf498 System.Threading.SemaphoreSlim+TaskNode
    -> 00000286fcaab498 System.Threading.SemaphoreSlim+TaskNode
    -> 00000287fcd16370 System.Threading.SemaphoreSlim+TaskNode
    -> 00000285fd3ee190 System.Threading.SemaphoreSlim+TaskNode
    -> 0000028502949e58 System.Action
    -> 0000028502949e38 System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner
    -> 0000028502949ee8 NamespaceClientApi.Models.Helpers.KeyedSemaphoreSlim+SemaphoreWrapper+<WaitAsync>d__8
    -> 0000028502949e98 System.Threading.Tasks.Task`1[[System.IDisposable, mscorlib]]
    -> 000002850294a138 System.Threading.Tasks.SynchronizationContextAwaitTaskContinuation
    -> 000002850294a038 System.Action
    -> 000002850294a018 System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner
    -> 000002850294a0c8 NamespaceClientApi.Models.Helpers.MemoryLocalCache+<GetCacheItemAsync>d__9`1[[System.Collections.Generic.HashSet`1[[System.ValueTuple`2[[System.Int64, mscorlib],[System.Int32, mscorlib]], mscorlib]], System.Core]]
    -> 000002850294a078 System.Threading.Tasks.Task`1[[System.Collections.Generic.HashSet`1[[System.ValueTuple`2[[System.Int64, mscorlib],[System.Int32, mscorlib]], mscorlib]], System.Core]]
    -> 000002850294a338 System.Threading.Tasks.SynchronizationContextAwaitTaskContinuation
    -> 000002850294a268 System.Action
    -> 000002850294a248 System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner
    -> 000002850294a2f8 Namespace.Services.ProjectInfo+<EventFlagsAsync>d__88
    -> 000002850294a2a8 System.Threading.Tasks.Task`1[[System.Collections.Generic.HashSet`1[[System.ValueTuple`2[[System.Int64, mscorlib],[System.Int32, mscorlib]], mscorlib]], System.Core]]
    -> 000002850294a540 System.Threading.Tasks.SynchronizationContextAwaitTaskContinuation
    -> 000002850294a468 System.Action
    -> 000002850294a448 System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner
    -> 000002850294a4f8 Namespace.Filters.EventPropertiesFilter+<SetEventFlags>d__2
    -> 000002850294a4a8 System.Threading.Tasks.Task`1[[Namespace.Filters.EventPropertiesFilter, Project.Library.Some]]
    -> 000002850294a740 System.Threading.Tasks.SynchronizationContextAwaitTaskContinuation
    -> 000002850294a670 System.Action
    -> 000002850294a650 System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner
    -> 000002850294a700 Namespace.Services.ProjectInfo+<InitEventFlagsAsync>d__90
    -> 000002850294a6b0 System.Threading.Tasks.Task`1[[Namespace.Filters.EventPropertiesFilter, Project.Library.Some]]
    -> 000002850294a9f0 System.Threading.Tasks.SynchronizationContextAwaitTaskContinuation

    00000284fbcc4ec0 (strong handle)
    -> 000002880224dc08 System.Web.RootedObjects
    -> 000002880224d568 System.Web.HttpContext

Found 3 unique roots (run '!GCRoot -all' to see all roots).

都卡在了EventFlags异步方法中,使用了基于SemaphoreSlim的缓存管理器。 System.Threading.SemaphoreSlim+TaskNode 个对象的实际数量是 1543 个。


来自SemaphoreSlim.cs

// Task in a linked list of asynchronous waiters
private sealed class TaskNode: Task <bool>, IThreadPoolWorkItem 
{
    internal TaskNode Prev, Next;
    internal TaskNode() : base() {}

    [SecurityCritical]
    void IThreadPoolWorkItem.ExecuteWorkItem() 
        {
        bool setSuccessfully = TrySetResult(true);
        Contract.Assert(setSuccessfully, "Should have been able to complete task");
    }

    [SecurityCritical]
    void IThreadPoolWorkItem.MarkAborted(ThreadAbortException tae)
        {
        /* nop */
    }
}

Task in a linked list of asynchronous waiters

当信号量忙于另一个线程时,似乎每次使用 help CreateAndAddAsyncWaiter 方法创建 TaskNode。因此,创建了一个队列的外观,上面提到了:

/// <summary>
/// Asynchronously waits to enter the <see cref="SemaphoreSlim"/>,
/// using a 32-bit signed integer to measure the time interval, 
/// while observing a <see cref="T:System.Threading.CancellationToken"/>.
/// </summary>
/// <param name="millisecondsTimeout">
/// The number of milliseconds to wait, or <see cref="Timeout.Infinite"/>(-1) to wait indefinitely.
/// </param>
/// <param name="cancellationToken">The <see cref="T:System.Threading.CancellationToken"/> to observe.</param>
/// <returns>
/// A task that will complete with a result of true if the current thread successfully entered 
/// the <see cref="SemaphoreSlim"/>, otherwise with a result of false.
/// </returns>
/// <exception cref="T:System.ObjectDisposedException">The current instance has already been
/// disposed.</exception>
/// <exception cref="ArgumentOutOfRangeException"><paramref name="millisecondsTimeout"/> is a negative number other than -1,
/// which represents an infinite time-out.
/// </exception>
public Task<bool> WaitAsync(int millisecondsTimeout, CancellationToken cancellationToken)
{
    CheckDispose();

    // Validate input
    if (millisecondsTimeout < -1)
    {
        throw new ArgumentOutOfRangeException(
            "totalMilliSeconds", millisecondsTimeout, GetResourceString("SemaphoreSlim_Wait_TimeoutWrong"));
    }

    // Bail early for cancellation
    if (cancellationToken.IsCancellationRequested)
        return Task.FromCancellation<bool>(cancellationToken);

    lock (m_lockObj)
    {
        // If there are counts available, allow this waiter to succeed.
        if (m_currentCount > 0)
        {
            --m_currentCount;
            if (m_waitHandle != null && m_currentCount == 0) m_waitHandle.Reset();
            return s_trueTask;
        }
            // If there aren't, create and return a task to the caller.
            // The task will be completed either when they've successfully acquired
            // the semaphore or when the timeout expired or cancellation was requested.
        else
        {
            Contract.Assert(m_currentCount == 0, "m_currentCount should never be negative");
            var asyncWaiter = CreateAndAddAsyncWaiter();
            return (millisecondsTimeout == Timeout.Infinite && !cancellationToken.CanBeCanceled) ?
                asyncWaiter :
                WaitUntilCountOrTimeoutAsync(asyncWaiter, millisecondsTimeout, cancellationToken);
        }
    }
}

好的,现在我们需要转到 SemaphoreWrapper 对象,我们在该对象的上下文中使用线程。让我们访问 AsynсStateMachine,它负责状态之间的转换,然后转到方法,它是“可执行的”:

0:000> !do poi(0000028502949e38+10)
Name:        Namespace.Models.Helpers.KeyedSemaphoreSlim+SemaphoreWrapper+<WaitAsync>d__8
MethodTable: 00007ffc28e470a8
EEClass:     00007ffc28e1e9e8
Size:        80(0x50) bytes
File:        C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root2547e9\cbe69b5a\assembly\dl351fdbb[=14=]ebcf32_4ebad501\Project.Library.SomeModels.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc85fb9180  4000426       10         System.Int32  1 instance                0 <>1__state
00007ffc28e25f48  4000427       18 ...sable, mscorlib]]  1 instance 0000028502949f00 <>t__builder
00007ffc28e464d8  4000428        8 ...+SemaphoreWrapper  0 instance 00000285fd3a4880 <>4__this
00007ffc85fab338  4000429       30 ...CancellationToken  1 instance 0000028502949f18 cancelToken
00007ffc85fadf70  400042a       38 ...iguredTaskAwaiter  1 instance 0000028502949f20 <>u__1

0:000> !do 00000285fd3a4880
Name:        Namespace.Models.Helpers.KeyedSemaphoreSlim+SemaphoreWrapper
MethodTable: 00007ffc28e464d8
EEClass:     00007ffc28e1e698
Size:        48(0x30) bytes
File:        C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root2547e9\cbe69b5a\assembly\dl351fdbb[=15=]ebcf32_4ebad501\Project.Library.SomeModels.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc28e46e68  40003fe        8 ...ClientApiModels]]  0 instance 00000285fd3a4840 _parentRelease
00007ffc85fa6fd0  40003ff       10 ...ing.SemaphoreSlim  0 instance 00000285fd3a48b0 _semaphoreSlim
00007ffc85fb9180  4000400       20         System.Int32  1 instance             1547 _useCount
00007ffc85fb6830  4000401       18        System.String  0 instance 00000285fd3a4800 <Key>k__BackingField

_useCountSemaphoreWrapper 的私有 属性 ,表示已经尝试访问信号量的次数。在这种情况下,1547 看起来很糟糕。显然,我们面临着一个死锁——其中一项任务永远不会完成,因为队列正在增加。结果,我们目睹了 API 冻结和 GC 代规模的增加。


[第 2 部分]

以下清单是转储对象所必需的,对于了解发生问题时缓存管理器的状态有用:

0:000> !do poi(000002850294a018+10)
Name:        Namespace.Models.Helpers.MemoryLocalCache+<GetCacheItemAsync>d__9`1[[System.Collections.Generic.HashSet`1[[System.ValueTuple`2[[System.Int64, mscorlib],[System.Int32, mscorlib]], mscorlib]], System.Core]]
MethodTable: 00007ffc291bbf58
EEClass:     00007ffc28e1ad98
Size:        112(0x70) bytes
File:        C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root2547e9\cbe69b5a\assembly\dl351fdbb[=16=]ebcf32_4ebad501\Project.Library.SomeModels.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc85fb9180  4000410       30         System.Int32  1 instance                0 <>1__state
00007ffc85f68e40  4000411       38 ...Canon, mscorlib]]  1 instance 000002850294a100 <>t__builder
00007ffc85fb6830  4000412        8        System.String  0 instance 0000028502949c98 key
00007ffc288c8510  4000413       10 ....MemoryLocalCache  0 instance 000002880225d460 <>4__this
00007ffc85fb6830  4000414       18        System.String  0 instance 00000286fc9bbb30 methodName
00007ffc85fab338  4000415       50 ...CancellationToken  1 instance 000002850294a118 ct
0000000000000000  4000416       20                       0 instance 0000028502949bb8 updateMethod
00007ffc85fb9180  4000417       34         System.Int32  1 instance              600 cacheTime
00007ffc858ab3e8  4000418       28   System.IDisposable  0 instance 0000000000000000 <>7__wrap1
00007ffc28e24bc8  4000419       58 ...sable, mscorlib]]  1 instance 000002850294a120 <>u__1
00007ffc85f6d2a0  400041a       60 ...Canon, mscorlib]]  1 instance 000002850294a128 <>u__2

0:000> !do 000002880225d460
Name:        Namespace.Models.Helpers.MemoryLocalCache
MethodTable: 00007ffc288c8510
EEClass:     00007ffc288aa6c8
Size:        32(0x20) bytes
File:        C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root2547e9\cbe69b5a\assembly\dl351fdbb[=17=]ebcf32_4ebad501\Project.Library.SomeModels.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc85fb6830  40002c7        8        System.String  0 instance 0000000000000000 _dbPrefix
00007ffc8fabf6e8  40002c8       10 ...ching.MemoryCache  0 instance 00000287fc909e10 _cache
00007ffc28e45998  40002c9       20 ...eyedSemaphoreSlim  0   static 00000284fc911968 _lock

0:000> !do 00000284fc911968
Name:        Namespace.Models.Helpers.KeyedSemaphoreSlim
MethodTable: 00007ffc28e45998
EEClass:     00007ffc28e1d758
Size:        40(0x28) bytes
File:        C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root2547e9\cbe69b5a\assembly\dl351fdbb[=18=]ebcf32_4ebad501\Project.Library.SomeModels.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc85fb6e10  40002c4        8        System.Object  0 instance 00000284fc911990 _lock
00007ffc28e46580  40002c5       10 ...ClientApiModels]]  0 instance 00000284fc9119a8 _wrapperMap
00007ffc85fa0b50  40002c6       18       System.Boolean  1 instance                0 _isDisposed

0:000> !do 00000284fc9119a8
Name:        System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[Namespace.Models.Helpers.KeyedSemaphoreSlim+SemaphoreWrapper, Project.Library.SomeModels]]
MethodTable: 00007ffc28e46580
EEClass:     00007ffc85964200
Size:        80(0x50) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc85fb9118  400186d        8       System.Int32[]  0 instance 00000287ff97e148 buckets
00007ffc86a89ae0  400186e       10 ...non, mscorlib]][]  0 instance 00000287ff97efc0 entries
00007ffc85fb9180  400186f       38         System.Int32  1 instance              765 count
00007ffc85fb9180  4001870       3c         System.Int32  1 instance            39027 version
00007ffc85fb9180  4001871       40         System.Int32  1 instance              763 freeList
00007ffc85fb9180  4001872       44         System.Int32  1 instance                2 freeCount
00007ffc85f77d10  4001873       18 ...Canon, mscorlib]]  0 instance 00000285fc8c9050 comparer
00007ffc85f75060  4001874       20 ...Canon, mscorlib]]  0 instance 0000000000000000 keys
00007ffc85f86870  4001875       28 ...Canon, mscorlib]]  0 instance 0000000000000000 values
00007ffc85fb6e10  4001876       30        System.Object  0 instance 0000000000000000 _syncRoot

0:000> .foreach /pS 16 /ps 1 (addr {!da 00000287ff97efc0}) { .if ($sicmp("${addr}", "null") != 0) { !do poi(${addr}+8) } }
<Note: this object has an invalid CLASS field>
Invalid object
Name:        Namespace.Models.Helpers.KeyedSemaphoreSlim+SemaphoreWrapper
MethodTable: 00007ffc28e464d8
EEClass:     00007ffc28e1e698
Size:        48(0x30) bytes
File:        C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root2547e9\cbe69b5a\assembly\dl351fdbb[=20=]ebcf32_4ebad501\Project.Library.SomeModels.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc28e46e68  40003fe        8 ...ClientApiModels]]  0 instance 00000285fd3a4840 _parentRelease
00007ffc85fa6fd0  40003ff       10 ...ing.SemaphoreSlim  0 instance 00000285fd3a48b0 _semaphoreSlim
00007ffc85fb9180  4000400       20         System.Int32  1 instance             1547 _useCount
00007ffc85fb6830  4000401       18        System.String  0 instance 00000285fd3a4800 <Key>k__BackingField

Name:        Namespace.Models.Helpers.KeyedSemaphoreSlim+SemaphoreWrapper
MethodTable: 00007ffc28e464d8
EEClass:     00007ffc28e1e698
Size:        48(0x30) bytes
File:        C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root2547e9\cbe69b5a\assembly\dl351fdbb[=20=]ebcf32_4ebad501\Project.Library.SomeModels.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc28e46e68  40003fe        8 ...ClientApiModels]]  0 instance 00000285fd3cbd18 _parentRelease
00007ffc85fa6fd0  40003ff       10 ...ing.SemaphoreSlim  0 instance 00000285fd3cbd88 _semaphoreSlim
00007ffc85fb9180  4000400       20         System.Int32  1 instance              846 _useCount
00007ffc85fb6830  4000401       18        System.String  0 instance 00000285fd3cbcb0 <Key>k__BackingField

通过分析KeyedSemaphoreSlim对象的私有字段_wrapperMap,我们可以了解问题发生在哪些键上,以及在哪个端点。在这种情况下,我们有两个键的 _useCount 字段值异常大。


1547

0:000> !do 00000285fd3a4800
Name:        System.String
MethodTable: 00007ffc85fb6830
EEClass:     00007ffc85896cb8
Size:        62(0x3e) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String:      _EventFlagsAsync_7
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc85fb9180  4000273        8         System.Int32  1 instance               18 m_stringLength
00007ffc85fb79e8  4000274        c          System.Char  1 instance               5f m_firstChar
00007ffc85fb6830  4000278       a0        System.String  0   shared           static Empty
                                 >> Domain:Value  00000284fc4e9070:NotInit  00000284fc600120:NotInit  <

846

0:000> !do 00000285fd3cbcb0
Name:        System.String
MethodTable: 00007ffc85fb6830
EEClass:     00007ffc85896cb8
Size:        102(0x66) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String:      _GetLiveMenuCacheAsync_13
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc85fb9180  4000273        8         System.Int32  1 instance               38 m_stringLength
00007ffc85fb79e8  4000274        c          System.Char  1 instance               5f m_firstChar
00007ffc85fb6830  4000278       a0        System.String  0   shared           static Empty
                                 >> Domain:Value  00000284fc4e9070:NotInit  00000284fc600120:NotInit  <<

我们可以得出结论,问题出现在 EventFlagsAsync 方法的上下文中,该方法作为 GetLiveMenuCacheAsync 的一部分被调用 - 一个具有几乎相同名称的端点。让我们看看 EventFlagsAsync 是什么以及它是如何被调用的:

internal async Task<EventPropertiesFilter> InitEventFlagsAsync()
{
    return await _eventProperties.SetEventFlags(EventFlagsAsync(_days));
}

public async Task<HashSet<ValueTuple<long, int>>> EventFlagsAsync(int days)
{
    return await _cache.GetCacheItemAsync<HashSet<ValueTuple<long, int>>>(days.ToString(),
        async () => await GetEventFlagsAsync(days), (int)MemoryLocalCache.CacheTimeSec.Min10, CancellationToken.None);
}

private async Task<HashSet<ValueTuple<long, int>>> GetEventFlagsAsync(int days)
{
    HashSet<ValueTuple<long, int>> ecs = new HashSet<ValueTuple<long, int>>();
    foreach (var ef in await _dbConnection.QueryAsync<ValueTuple<long, int>>(Query_EventFlags, 
        new { StartDate = DateTime.UtcNow.AddHours(-6), EndDate = DateTime.UtcNow.AddDays(days) })) 
    {
        ecs.Add(ef);
    }
    
    return ecs;
}

public async Task<EventPropertiesFilter> SetEventFlags(Task<HashSet<ValueTuple<long, int>>> eventFlags)
{
    _eventFlags = _eventFlags ?? await eventFlags;
    return this;
}

因此,我们来到源代码行,其中执行了 Task 的不正确工作,这导致我们得出这个故事:

_eventFlags = _eventFlags ?? await eventFlags;

UPD

有一个扩展可以更轻松地使用异步方法 - !dumpasync

The !dumpasync command is a WinDbg extension that produces a list of async method callstacks that may not appear on thread callstacks. This information is useful when diagnosing the cause for an app hang that is waiting for async methods to complete.

不幸的是当时它 did not work 和我的转储一起使用,所以我不得不在没有它的情况下采取行动,尽管问题可能不相关(我没有时间更深入地讨论这个话题,虽然这是值得的)。