使用 WinDbg 查找谁持有本机进程的 SRW 锁

Find who holds a SRW Lock for a native process with WinDbg

我有一个用 c++ 编写的程序,我很难找到哪个线程已获取 Slim Reader/Writer (SRW) Locks. I googled and found Determining which method is holding a ReaderWriterLockSlim WriteLock,但它是关于一个用 C# 编写的程序。此外,有些命令,例如.rwlock,是不可用的。

0:796> !handle 0 ff Mutant
Handle c
  Type          Mutant
  Attributes    0
  GrantedAccess 0x1f0001:
         Delete,ReadControl,WriteDac,WriteOwner,Synch
         QueryState
  HandleCount   4
  PointerCount  103240
  Name          \BaseNamedObjects\DBWinMutex
  Object Specific Information
    Mutex is Free
Handle 474
  Type          Mutant
  Attributes    0
  GrantedAccess 0x1f0001:
         Delete,ReadControl,WriteDac,WriteOwner,Synch
         QueryState
  HandleCount   2
  PointerCount  65536
  Name          \BaseNamedObjects\SM0:928:304:WilStaging_02
  Object Specific Information
    Mutex is Free
2 handles of type Mutant
0:796> kb
RetAddr           : Args to Child                                                           : Call Site
00007ff9`b6e3d33a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!ZwWaitForAlertByThreadId+0x14
00007ff9`a85726a9 : 00000000`00000000 00000000`00000000 00000192`83338180 00000000`00000000 : ntdll!RtlAcquireSRWLockExclusive+0x13a
00007ff9`a6231724 : c000000d`00000000 00000000`00000000 00000192`83338180 00000000`00000002 : MSVCP140!mtx_do_lock+0x7d [d:\agent\_work\s\src\vctools\crt\crtw32\stdcpp\thr\mutex.cpp @ 106]
00007ff9`a626749e : 00000192`f6a26e38 00000193`4aaa3d80 00000052`897fea60 00000000`00000000 : AZSDK!AZConnection::Post+0x54 [g:\prod\sdk\src\connection.cpp @ 1147]
...
00007ff9`9c8ba9c1 : 00000192`c3b3d770 00000000`00000000 00000192`f5d616b0 00000000`00000000 : prod!Task::Execute+0x28 [g:\prod\src\task.cpp @ 51]
00007ff9`b6e97529 : 00000193`491b9830 00000000`7ffe0386 00000052`897ff998 00000193`491b98f8 : prod!Proxy::TaskExecuter+0x11 [g:\prod\src\proxy.cpp @ 2042]
00007ff9`b6e3bec4 : 00000000`00000000 00000192`f1dd03a0 00000000`00000000 00000000`00000000 : ntdll!TppSimplepExecuteCallback+0x99
00007ff9`b6c47e94 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!TppWorkerThread+0x644
00007ff9`b6e87ad1 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : KERNEL32!BaseThreadInitThunk+0x14
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x21
0:796> !rwlock
No export rwlock found

C++ 代码片段:

std::mutex m_mutex;

Status AZConnection::Post(const Request* request, Result** pResult)
{
    std::lock_guard<std::mutex> sbguard(m_mutex);
}

已更新:

根据,我明白了。现在我不得不放弃了。

事实上,我的程序仍然是 运行,但已停止服务。我必须找到原因。我发现有 806 个线程,其中大部分都在焦急地等待 Post。此外,如果它不重现,我无法重新启动以添加已获取锁的日志打印。因此,我只想检查持有锁的线程在做什么。

Win32 本机 SRWLock 不保留该信息。在无竞争状态下,它只是一个原子标志。

因此,没有可以执行此操作的 WinDbg 命令。

当存在争用时,等待队列由正在等待的线程组成。仍然没有关于持有锁的线程的信息。

有关 SRWLock 实现的更多详细信息,请参阅 this answer

我可以在 MEX Debugging Extension for WinDbg to grep the callstack and execute a command (multiple commands separated by ; (Command Separator) are not supported) to find the thread who is not watting for the lock but whose previous call is Post. The extension can be downloaded from here. After downloading, it can be put in C:\WinDDK00.16385.1\Debuggers\winext (see also Loading Debugger Extension DLLs 中使用 !foreachframe!if

我在下面的代码中用msvcrt!_threadstartex替换了MSVCP140!mtx_do_lock,把AZSDK!AZConnection::Post替换成了KERNEL32!BaseThreadInitThunk

~*e r @$t0 = -1; !foreachframe -q -f 'KERNEL32!BaseThreadInitThunk' r @$t0= @#FrameNum - 1; .if(0<=@$t0) { !if -DoesNotContainRegex 'msvcrt!_threadstartex' -then '.printf /D "Thread: <link cmd=\"~~[%x]\">0x%x</link> (<link cmd=\"!mex.t %d\">%d</link>)", $tid, $tid, $dtid, $dtid' .frame @$t0 }

!foreachthread -q !foreachframe -q -f 'KERNEL32!BaseThreadInitThunk' !if -DoesNotContainRegex 'msvcrt!_threadstartex' -then '.printf /D "Thread: <link cmd=\"~~[%x]\">0x%x</link> (<link cmd=\"!mex.t %d\">%d</link>)", $tid, $tid, $dtid, $dtid' .frame @#FrameNum - 1

notepad 进程的输出示例:

Thread: 0xd14c (0)
Thread: 0x4f88 (1)
Thread: 0xd198 (7)

所有线程的调用栈

0:001> !foreachthread k
Child-SP          RetAddr           Call Site
00000062`fabaf8c8 00007ffd`54c7409d win32u!ZwUserGetMessage+0x14
00000062`fabaf8d0 00007ff7`bb4c449f USER32!GetMessageW+0x2d
00000062`fabaf930 00007ff7`bb4dae07 notepad+0x449f
00000062`fabafa30 00007ffd`570f7974 notepad+0x1ae07
00000062`fabafaf0 00007ffd`5841a261 KERNEL32!BaseThreadInitThunk+0x14
00000062`fabafb20 00000000`00000000 ntdll!RtlUserThreadStart+0x21

Changing to thread: 0xda94 (1)
Child-SP          RetAddr           Call Site
00000062`fae7fbc8 00007ffd`5847f01b ntdll!DbgBreakPoint
00000062`fae7fbd0 00007ffd`570f7974 ntdll!DbgUiRemoteBreakin+0x4b
00000062`fae7fc00 00007ffd`5841a261 KERNEL32!BaseThreadInitThunk+0x14
00000062`fae7fc30 00000000`00000000 ntdll!RtlUserThreadStart+0x21

Changing to thread: 0xdb60 (7)
Child-SP          RetAddr           Call Site
00000062`fb17f968 00007ffd`54c7004d win32u!ZwUserMsgWaitForMultipleObjectsEx+0x14
00000062`fb17f970 00007ffc`e4e7d078 USER32!MsgWaitForMultipleObjectsEx+0x9d
00000062`fb17f9b0 00007ffc`e4e7cec2 DUser!GetMessageExA+0x2f8
00000062`fb17fa50 00007ffd`54c77004 DUser!GetMessageExA+0x142
00000062`fb17fab0 00007ffd`584534a4 USER32!Ordinal2582+0x64
00000062`fb17fb50 00007ffd`54101164 ntdll!KiUserCallbackDispatcher+0x24
00000062`fb17fbc8 00007ffd`54c7409d win32u!ZwUserGetMessage+0x14
00000062`fb17fbd0 00007ffd`2e4efa3c USER32!GetMessageW+0x2d
00000062`fb17fc30 00007ffd`1d0b30f8 DUI70!StartMessagePump+0x3c
00000062`fb17fc90 00007ffd`1d0b31ce msctfuimanager!DllCanUnloadNow+0xf3e8
00000062`fb17fd50 00007ffd`570f7974 msctfuimanager!DllCanUnloadNow+0xf4be
00000062`fb17fd80 00007ffd`5841a261 KERNEL32!BaseThreadInitThunk+0x14
00000062`fb17fdb0 00000000`00000000 ntdll!RtlUserThreadStart+0x21

Changing to thread: 0xc490 (8)
Child-SP          RetAddr           Call Site
00000062`fb1ff708 00007ffd`54c7004d win32u!ZwUserMsgWaitForMultipleObjectsEx+0x14
00000062`fb1ff710 00007ffc`e4e7d1ca USER32!MsgWaitForMultipleObjectsEx+0x9d
00000062`fb1ff750 00007ffc`e4e7cde7 DUser!GetMessageExA+0x44a
00000062`fb1ff7f0 00007ffc`e4e7ca53 DUser!GetMessageExA+0x67
00000062`fb1ff840 00007ffd`5505b0ea DUser!GetGadgetFocus+0x33b3
00000062`fb1ff8d0 00007ffd`5505b1bc msvcrt!_callthreadstartex+0x1e
00000062`fb1ff900 00007ffd`570f7974 msvcrt!_threadstartex+0x7c
00000062`fb1ff930 00007ffd`5841a261 KERNEL32!BaseThreadInitThunk+0x14
00000062`fb1ff960 00000000`00000000 ntdll!RtlUserThreadStart+0x21

已更新:

我发现 Someone 也有同样的想法(在死链接的情况下,经常出现):

Slim reader/writer locks don't remember who the owners are, so you'll have to find them some other way

Raymond | August 10th, 2011

The slim reader/writer lock is a very convenient synchronization facility, but one of the downsides is that it doesn’t > keep track of who the current owners are.
When your thread is stuck waiting to acquire a slim reader/writer lock, a natural thing to want to know is which threads own the resource your stuck thread > waiting for.

Since there’s not facility for going from the waiting thread to the owning threads, you’ll just have to find the owning threads some other way. Here’s the thread > that is waiting for the lock in shared mode:

ntdll!ZwWaitForKeyedEvent+0xc
ntdll!RtlAcquireSRWLockShared+0x126
dbquery!CSearchSpace::Validate+0x10b
dbquery!CSearchSpace::DecomposeSearchSpace+0x3c
dbquery!CQuery::AddConfigs+0xdc
dbquery!CQuery::ResolveProviders+0x89
dbquery!CResults::CreateProviders+0x85
dbquery!CResults::GetProviders+0x61
dbquery!CResults::CreateResults+0x11c

Okay, how do you find the thread that owns the lock?

First, slim reader/writer locks are usable only within a process, so the candidate threads are the one within the process.

Second, the usage pattern for locks is nearly always something like

enter lock
do something
exit lock

It is highly unusual for a function to take a lock and exit to
external code with the lock held. (It might exit to other code within the same component, transferring the obligation to exit the lock to that other code.)
Therefore, you want to look for threads that are still inside dbquery.dll, possibly even still inside CSearch­Space (if the lock is a per-object lock rather > than a global one).

Of course, the possibility might be that the code that entered the lock messed up and forgot to release it, but if that’s the case, no amount of searching for it > will find anything since the culprit is long gone.
Since debugging is an exercise in optimism, we may as well proceed on the assumption that > we’re not in the case. If it fails to find the lock owner, then we may have to revisit the assumption.

Finally, the last trick is knowing which threads to ignore.

For now, you can also ignore the threads that are waiting for the lock, since they are the victims not the cause. (Again, if we fail to find the lock owner, we > can revisit the assumption that they are not the cause; for example, they may be attempting to acquire the lock recursively.)

As it happens,
there is only one thread in the process that passes all the above filters.

dbquery!CProp::Marshall+0x3b
dbquery!CRequest::CRequest+0x24c
dbquery!CQuery::Execute+0x668
dbquery!CResults::FillParams+0x1c4
dbquery!CResults::AddProvider+0x4e
dbquery!CResults::AddConfigs+0x1c5
dbquery!CResults::CreateResults+0x145

This may not be the source of the problem, but it’s a good start.
(Actually, it looks very promising since the problem is probably
that the process on the other side of the marshaller is stuck.)