确定进程是否被 bash 中的信号杀死

Identify whether a process was killed by a signal in bash

考虑这两个 C 程序:

#include <signal.h>

int main(void) {
    raise(SIGTERM);
}
int main(void) {
    return 143;
}

如果我 运行 任一个,bash 中的 $? 的值将是 143。 The wait syscall 让你区分它们,但是:

wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11148
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 143}], 0, NULL) = 11214

和 bash 显然使用了这些知识,因为第一个导致 Terminated 被打印到终端(奇怪的是,即使我将 stdout 和 stderr 都重定向到其他地方,也会发生这种情况),并且第二个没有。如何区分这两种情况与 bash 脚本?

我认为从纯 bash/shell 获取完整的退出代码是不可能的。 Unix' StackExchange 上的回答很全面。

What's common between all shells is that $? contains the lowest 8 bits of the exit code (the number passed to exit()) if the process terminated normally.

Where it differs is when the process is terminated by a signal. In all cases, and that's required by POSIX, the number will be greater than 128. POSIX doesn't specify what the value may be. In practice though, in all Bourne-like shells that I know, the lowest 7 bits of $? will contain the signal number. But, where n is the signal number,

  • in ash, zsh, pdksh, bash, the Bourne shell, $? is 128 + n. What that means is that in those shells, if you get a $? of 129, you don't know whether it's because the process exited with exit(129) or whether it was killed by the signal 1 (HUP on most systems). But the rationale is that shells, when they do exit themselves, by default return the exit status of the last exited command. By making sure $? is never greater than 255, that allows to have a consistent exit status:

    $ bash -c 'sh -c "kill $$"; printf "%x\n" "$?"'
    bash: line 1: 16720 Terminated              sh -c "kill $$"
    8f # 128 + 15
    $ bash -c 'sh -c "kill $$"; exit'; printf '%x\n' "$?"
    bash: line 1: 16726 Terminated              sh -c "kill $$"
    8f # here that 0x8f is from a exit(143) done by bash. Though it's
       # not from a killed process, that does tell us that probably
       # something was killed by a SIGTERM
    

出于这个原因,我相信您需要 运行 bash 之外的命令来捕获退出代码。


通过一些抽象,similar question has been asked regarding unbuffer 这是一个用 tcl 编写的小脚本。更准确地说,unbuffer 使用带有 tcl/tk 包装器的库 libexpect。 我从unbuffer的源码中提取了相关代码推导出解决方法:

#!/bin/bash

expectStat() {
expect <(cat << EOT
set stty_init "-opost"
set timeout -1
eval [list spawn -noecho ] $@
expect
send_user "[wait]\n"
EOT
)
}

expectStat sleep 5 & 
wait

如果 sleep 正常退出,其中 return 大约是以下行:

18383 exp4 0 0

如果 sleep 在它自己退出之前被杀死,上面的脚本将大约 return:

18383 exp4 0 0 CHILDKILLED SIGTERM {software termination signal}

如果脚本以 exit 143 终止,脚本将大约 return:

18383 exp4 0 143

这些字符串的含义可以从expect的手册中提取。集成函数 wait 是 return 上面的 return 行。 前两个值是 pid 和 expect 进程的名称。 第四个是退出状态。如果出现信号,则会打印更多信息。第六个值是在进程终止时发送给进程的信号。

wait

normally returns a list of four integers. The first integer is the pid of the process that was waited upon. The second integer is the corresponding spawn id. The third integer is -1 if an operating system error occurred, or 0 otherwise. If the third integer was 0, the fourth integer is the status returned by the spawned process. If the third integer was -1, the fourth integer is the value of errno set by the operating system. The global variable errorCode is also set.

Additional elements may appear at the end of the return value from wait. An optional fifth element identifies a class of information. Currently, the only possible value for this element is CHILDKILLED in which case the next two values are the C-style signal name and a short textual description.

这意味着第四个值和第六个值(如果存在)就是您要查找的值。存储整行并提取信号和退出代码,例如使用以下代码:

RET=$(expectStat script.sh 1>&1)

# Filter status
EXITVALUE="$(echo "$RET" | cut -d' ' -f4)"
SIGNAL=$(echo "$RET" | cut -d' ' -f6)

#echo "Exit value: $EXITVALUE, Signal: $SIGNAL" 

if [ -n "$SIGNAL" ]; then
        echo "Likely killed by signal"
else
        echo "$EXITVALUE"
fi

总之,此解决方法非常不优雅。也许,还有另一种工具自带基于 c 的工具来获取信号的出现。

Strace 可以捕获大部分信号,但可能不适用于系统调用(例如 kill -9 ),因此,如 this article:

中所述

Auditd is a daemon process or service that does as the name implies and produces audit logs of System level activities. It is installed from the usual repository as the audit package and then is configured in /etc/audit/auditd.conf and the rules are in /etc/audit/audit.rules.

本文提供了审核输出的示例,可以帮助确定它是否对您有帮助:

The usual output will look like this:

time->Wed Jun 3 16:34:08 2015 type=SYSCALL msg=audit(1433363648.091:6342): arch=c000003e syscall=62 success=no exit=-3 a0=1e06 a1=0 a2=1e06 a3=fffffffffffffff0 items=0 ppid=10044 pid=10140 auid=500 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=2 comm=4174746163682041504920696E6974 exe="/opt/ibm/WebSphere/AppServer/java/jre/bin/java" subj=unconfined_u:unconfined_r:unconfined_java_t:s0-s0:c0.c1023 key="kill_signals"

还提到了 System Tap,以及 redirection 指南。

这偏离了 bash 但 bcc offers exitsnoop。使用 描述中的示例,在 Debian Sid 上:

root@vsid:~# apt install bpfcc-tools linux-headers-amd64
root@vsid:~# exitsnoop-bpfcc
PCOMM            PID    PPID   TID    AGE(s)  EXIT_CODE
example1         1041   948    1041   0.00    signal 15 (TERM)
example2         1042   948    1042   0.00    code 143
^C

请参阅 install guide 了解其他发行版。

wait 是一个系统调用,也是一个 bash builtin.

将这两种情况与bash 运行后台程序区分开来,并使用内置wait报告结果。

以下是 non-zero 退出代码和未捕获信号的示例。这些示例在 child bash shell 中使用 exitkill bash 而不是 child [=38] =] shell 你会 运行 你的程序。

$ bash -c 'kill -s SIGTERM $$' & wait
[1] 36068
[1]+  Terminated: 15          bash -c 'kill -s SIGTERM $$'
$ bash -c 'exit 143' & wait
[1] 36079
[1]+  Exit 143                bash -c 'exit 143'
$

至于为什么即使您重定向 stdout 和 stderr 也会看到 Terminated 打印到终端,原因是 bash 而不是程序打印。

更新:

通过显式使用 wait 内置函数,您现在可以将其 stderr(带有程序的退出状态)重定向到一个单独的文件。

以下示例显示三种终止类型:正常退出 0、non-zero 退出和未捕获信号。 wait 报告的结果存储在用相应程序的 PID 标记的文件中。

$ bash -c 'exit 0' & wait 2> exit_status_pid_$!
[1] 40279
$ bash -c 'exit 143' & wait 2> exit_status_pid_$!
[1] 40291
$ bash -c 'kill -s SIGTERM $$' & wait 2> exit_status_pid_$!
[1] 40303
$  for f in exit_status_pid*; do echo $f: $(cat $f); done
exit_status_pid_40279: [1]+ Done bash -c 'exit 0'
exit_status_pid_40291: [1]+ Exit 143 bash -c 'exit 143'
exit_status_pid_40303: [1]+ Terminated: 15 bash -c 'kill -s SIGTERM $$'
$