为什么 "pstack" 只打印一个线程的内容?

Why does "pstack" only print one thread's content?

我的OSRHEL 7,我运行一个简单的Go程序:

package main

import (
    "time"
)

func main() {
    time.Sleep(1000 * time.Second)
}

在其 运行 期间,我检查进程的线程数:

# cat /proc/13858/status | grep Thread
Threads:        5

在使用 RHEL 上发布的 pstack 命令时,它只打印一个线程的堆栈:

# pstack 13858
Thread 1 (process 13858):
#0  runtime.futex () at /usr/local/go/src/runtime/sys_linux_amd64.s:307
#1  0x0000000000422580 in runtime.futexsleep (addr=0x4c7af8 <runtime.timers+24>, val=0, ns=999999997446) at /usr/local/go/src/runtime/os1_linux.go:57
#2  0x000000000040b07b in runtime.notetsleep_internal (n=0x4c7af8 <runtime.timers+24>, ns=999999997446, ~r2=255) at /usr/local/go/src/runtime/lock_futex.go:174
#3  0x000000000040b1e6 in runtime.notetsleepg (n=0x4c7af8 <runtime.timers+24>, ns=999999997446, ~r2=false) at /usr/local/go/src/runtime/lock_futex.go:206
#4  0x000000000043e5de in runtime.timerproc () at /usr/local/go/src/runtime/time.go:209
#5  0x0000000000451001 in runtime.goexit () at /usr/local/go/src/runtime/asm_amd64.s:1998
#6  0x0000000000000000 in ?? ()

为什么pstack只打印一个线程的内容?

P.S.: pstack 脚本在这里:

#!/bin/sh

if test $# -ne 1; then
    echo "Usage: `basename [=14=] .sh` <process-id>" 1>&2
    exit 1
fi

if test ! -r /proc/; then
    echo "Process  not found." 1>&2
    exit 1
fi

# GDB doesn't allow "thread apply all bt" when the process isn't
# threaded; need to peek at the process to determine if that or the
# simpler "bt" should be used.

backtrace="bt"
if test -d /proc//task ; then
    # Newer kernel; has a task/ directory.
    if test `/bin/ls /proc//task | /usr/bin/wc -l` -gt 1 2>/dev/null ; then
        backtrace="thread apply all bt"
    fi
elif test -f /proc//maps ; then
    # Older kernel; go by it loading libpthread.
    if /bin/grep -e libpthread /proc//maps > /dev/null 2>&1 ; then
        backtrace="thread apply all bt"
    fi
fi

GDB=${GDB:-/usr/bin/gdb}

# Run GDB, strip out unwanted noise.
# --readnever is no longer used since .gdb_index is now in use.
$GDB --quiet -nx $GDBARGS /proc//exe  <<EOF 2>&1 |
set width 0
set height 0
set pagination no
$backtrace
EOF
/bin/sed -n \
    -e 's/^\((gdb) \)*//' \
    -e '/^#/p' \
    -e '/^Thread/p' 

当您将 LWP/thread id 传递给 pstack 时,您只会得到该线程的堆栈。尝试将进程的 PID 传递给 pstack,您将获得其所有线程的堆栈。您可能会得到进程的 PID 或 Tgid(线程组 ID):cat /proc/13858/status | grep Tgid。要获取您的流程创建的所有 LWP,您可以 运行 ps -L <PID>

pstack 使用 gdb。这是来自 golang doc (https://golang.org/doc/gdb) 的引用:

GDB does not understand Go programs well. The stack management, threading, and runtime contain aspects that differ enough from the execution model GDB expects that they can confuse the debugger, even when the program is compiled with gccgo. As a consequence, although GDB can be useful in some situations, it is not a reliable debugger for Go programs, particularly heavily concurrent ones.

您在 /proc 中看到的 5 个线程中有 4 个是在程序进入 main 之前创建的。我假设 golang 运行时创建它们。

Why does pstack only print one thread's content?

从 gdb 的 strace 输出来看,我看到 gdb 实际上试图附加到它们,但在出现问题后 gdb 没有尝试检查这些线程。这些是 gdb 为这些运行时线程发出的系统调用,但由于未知原因决定立即停止调查它们:

5072  ptrace(PTRACE_ATTACH, 5023, 0, 0) = 0
5072  --- SIGCHLD (Child exited) @ 0 (0) ---
5072  rt_sigreturn(0x11)                = 0
5072  ptrace(PTRACE_ATTACH, 5024, 0, 0) = 0
5072  --- SIGCHLD (Child exited) @ 0 (0) ---
5072  rt_sigreturn(0x11)                = 0
5072  ptrace(PTRACE_ATTACH, 5025, 0, 0) = 0
5072  --- SIGCHLD (Child exited) @ 0 (0) ---
5072  rt_sigreturn(0x11)                = 0

但是您可以自己检查它们。看来这些线程属于golang runtime

$ pstack 5094
Thread 1 (process 5094):
#0  0x0000000000459243 in runtime.futex ()
#1  0x00000000004271e0 in runtime.futexsleep ()
#2  0x000000000040d55b in runtime.notetsleep_internal ()
#3  0x000000000040d64b in runtime.notetsleep ()
#4  0x0000000000435677 in runtime.sysmon ()
#5  0x000000000042e6cc in runtime.mstart1 ()
#6  0x000000000042e5d2 in runtime.mstart ()
#7  0x00000000004592b7 in runtime.clone ()
#8  0x0000000000000000 in ?? ()

$ pstack 5095
Thread 1 (process 5095):
#0  0x0000000000459243 in runtime.futex ()
#1  0x0000000000427143 in runtime.futexsleep ()
#2  0x000000000040d3f4 in runtime.notesleep ()
#3  0x000000000042f6eb in runtime.stopm ()
#4  0x0000000000430a79 in runtime.findrunnable ()
#5  0x00000000004310ff in runtime.schedule ()
#6  0x000000000043139b in runtime.park_m ()
#7  0x0000000000455acb in runtime.mcall ()
#8  0x000000c820021500 in ?? ()
#9  0x0000000000000000 in ?? ()

$ pstack 5096
Thread 1 (process 5096):
#0  0x0000000000459243 in runtime.futex ()
#1  0x0000000000427143 in runtime.futexsleep ()
#2  0x000000000040d3f4 in runtime.notesleep ()
#3  0x000000000042f6eb in runtime.stopm ()
#4  0x000000000042fff7 in runtime.startlockedm ()
#5  0x0000000000431147 in runtime.schedule ()
#6  0x000000000043139b in runtime.park_m ()
#7  0x0000000000455acb in runtime.mcall ()
#8  0x000000c820020000 in ?? ()

gdb 8.0 更新

使用 gdb 8.0 的 pstack 正确打印所有威胁的回溯。该命令如下所示:

$ GDB=$HOME/bin/gdb pstack  $(pidof main)

这是它的输出(缩写):

$ GDB=$HOME/bin/gdb pstack  $(pidof main) | egrep "^Thread"
Thread 4 (LWP 18335):
Thread 3 (LWP 18334):
Thread 2 (LWP 18333):
Thread 1 (LWP 18332):