这是在多线程程序中处理 SIGFPE、SIGSEGV 等信号的 POSIX 兼容实现吗?

Is this a POSIX-compliant implementation for handling signals such as SIGFPE, SIGSEGV, etc. in a multithreaded program?

我正在开发一个需要处理崩溃信号的程序。 崩溃信号,我的意思是信号“由于硬件异常而传送”[1],例如SIGFPESIGSEGV。我还没有找到描述此信号类别的特定名称,因此为了清楚和减少冗长,我想出了这个名称。

根据我的研究,捕捉这些信号很痛苦。崩溃信号处理程序不得 return,否则行为未定义 [2][3]。具有未定义的行为意味着实现可能会终止进程或重新发出信号,使程序陷入无限循环,这是不可取的。

另一方面,通常信号处理程序内部几乎没有自由,特别是在多线程程序中:在信号处理程序中调用的函数必须同时是线程安全的和异步信号安全的 [4]。例如,你不能调用 malloc() 因为它不是异步信号安全的,你也不能调用依赖它的其他函数。特别是,当我使用 C++ 时,我无法安全地调用 GCC 的 abi::__cxa_demangle() 来生成像样的堆栈跟踪,因为它在内部使用 malloc()。虽然我可以使用 Chromium 的库 symbolize [5] 进行异步信号安全和线程安全的 C++ 符号名称分解,但我无法使用 dladdr() 获得更多信息的堆栈跟踪因为它没有指定异步信号安全。

另一种处理通用信号的方法是使用 sigprocmask()(或多线程程序中的 pthread_sigmask())将它们阻塞在工作线程中,并在该线程中调用 sigwait()。这适用于非崩溃信号,例如 SIGINTSIGTERM。然而,“如果 SIGFPESIGILLSIGSEGVSIGBUS 信号中的任何一个在它们被阻塞时生成,则结果是不确定的”[6],并且再次, 所有赌注都取消了。

浏览信号安全 [4] 的手册页,我发现 sem_post() 是异步信号安全的(当然也是线程安全的),并围绕它实施了一个解决方案类似于 sigwait() 方法。这个想法是产生一个信号处理线程,它用 pthread_sigmask() 阻塞信号并调用 sem_wait()。还定义了崩溃信号处理程序,以便每当引发崩溃信号时,处理程序将信号设置为全局范围变量,调用 sem_post(),并等待信号处理线程完成处理并退出程序。

请注意,为了简单起见,以下实现不检查来自系统调用的 return 值。

// Std
#include <atomic>
#include <cstdlib>
#include <ctime>
#include <iostream>
#include <thread>

// System
#include <semaphore.h>
#include <signal.h>
#include <unistd.h>

// NOTE: C++20 exempts it from `ATOMIC_FLAG_INIT`
std::atomic_flag caught_signal = ATOMIC_FLAG_INIT;
int crash_sig = 0;

sem_t start_semaphore;
sem_t signal_semaphore;

extern "C" void crash_signal_handler(int sig)
{
    // If two or more threads evaluate this condition at the same time,
    // one of them shall enter the if-branch and the rest will skip it.
    if (caught_signal.test_and_set(std::memory_order_relaxed) == false)
    {
        // `crash_sig` needs not be atomic since only this thread and 
        // the signal processing thread use it, and the latter is
        // `sem_wait()`ing.
        crash_sig = sig;
        sem_post(&signal_semaphore);
    }

    // It is undefined behavior if a signal handler returns from a crash signal.
    // Implementations may re-raise the signal infinitely, kill the process, or whatnot,
    // but we want the crash signal processing thread to try handling the signal first;
    // so don't return.
    //
    // NOTE: maybe one could use `pselect()` here as it is async-signal-safe and seems to 
    //       be thread-safe as well. `sleep()` is async-signal-safe but not thread-safe.
    while (true)
        ;

    const char msg[] = "Panic: compiler optimized out infinite loop in signal handler\n";

    write(STDERR_FILENO, msg, sizeof(msg));
    std::_Exit(EXIT_FAILURE);
}

void block_crash_signals()
{
    sigset_t set;
    sigemptyset(&set);
    sigaddset(&set, SIGSEGV);
    sigaddset(&set, SIGFPE);

    pthread_sigmask(SIG_BLOCK, &set, nullptr);
}

void install_signal_handler()
{
    // NOTE: one may set an alternate stack here.

    struct sigaction sig;
    sig.sa_handler = crash_signal_handler;
    sig.sa_flags   = 0;

    ::sigaction(SIGSEGV, &sig, nullptr);
    ::sigaction(SIGFPE,  &sig, nullptr);
}

void restore_signal_handler()
{
    struct sigaction sig;
    sig.sa_handler = SIG_DFL;
    sig.sa_flags   = 0;

    ::sigaction(SIGSEGV, &sig, nullptr);
    ::sigaction(SIGFPE,  &sig, nullptr);
}

void process_crash_signal()
{
    // If a crash signal occurs, the kernel will invoke `crash_signal_handler` in
    // any thread which may be not this current one.
    block_crash_signals();

    install_signal_handler();

    // Tell main thread it's good to go.
    sem_post(&start_semaphore);

    // Wait for a crash signal.
    sem_wait(&signal_semaphore);

    // Got a signal.
    //
    // We're not in kernel space, so we are "safe" to do anything from this thread,
    // such as writing to `std::cout`. HOWEVER, operations performed by this function,
    // such as calling `std::cout`, may raise another signal. Or the program may be in
    // a state where the damage was so severe that calling any function will crash the
    // program. If that happens, there's not much what we can do: this very signal
    // processing function is broken, so let the kernel invoke the default signal
    // handler instead.
    restore_signal_handler();

    const char* signame;

    switch (crash_sig)
    {
        case SIGSEGV: signame = "SIGSEGV"; break;
        case SIGFPE:  signame = "SIGFPE"; break;
        default:      signame = "weird, this signal should not be raised";
    }

    std::cout << "Caught signal: " << crash_sig << " (" << signame << ")\n";

    // Uncomment these lines to invoke `SIG_DFL`.
    // volatile int zero = 0;
    // int a = 1 / zero;

    std::cout << "Sleeping for 2 seconds to prove that other threads are waiting for me to finish :)\n";
    std::this_thread::sleep_for(std::chrono::seconds{ 2 });

    std::cout << "Alright, I appreciate your patience <3\n";

    std::exit(EXIT_FAILURE);
}

void divide_by_zero()
{
    volatile int zero = 0;
    int oops = 1 / zero;
}

void access_invalid_memory()
{
    volatile int* p = reinterpret_cast<int*>(0xdeadbeef); // dw, I know what I'm doing lmao
    int oops = *p;
}

int main()
{
    // TODO: maybe use the pthread library API instead of `std::thread`.
    std::thread worker{ process_crash_signal };

    // Wait until `worker` has started.
    sem_wait(&start_semaphore);

    std::srand(static_cast<unsigned>(std::time(nullptr)));

    while (true)
    {
        std::cout << "Odds are the program will crash...\n";

        switch (std::rand() % 3)
        {
            case 0:
                std::cout << "\nCalling divide_by_zero()\n";
                divide_by_zero();
                std::cout << "Panic: divide_by_zero() returned!\n";
                return 1;

            case 1:
                std::cout << "\nCalling access_invalid_memory()\n";
                access_invalid_memory();
                std::cout << "Panic: access_invalid_memory() returned!\n";
                return 1;

            default:
                std::cout << "...not this time, apparently\n\n";
                continue;
        }
    }

    return 0;
}

编译它
$ g++ --version
g++ (Debian 9.2.1-22) 9.2.1 20200104
$ g++ -pthread -o handle_crash_signal handle_crash_signal.cpp

产量

$ ./handle_crash_signal 
Odds are the program will crash...

Calling access_invalid_memory()
Caught signal: 11 (SIGSEGV)
Sleeping for 2 seconds to prove that other threads are waiting for me to finish :)
Alright, I appreciate your patience <3

[1] https://man7.org/linux/man-pages/man7/signal.7.html

[2] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1318.htm

[3]

[4]https://man7.org/linux/man-pages/man7/signal-safety.7.html

[5]https://chromium.googlesource.com/chromium/src/base/+/master/third_party/symbolize

[6]https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigprocmask.html

相关话题:Catching signals such as SIGSEGV and SIGFPE in multithreaded program

不,不是 POSIX-compliant。定义的 signal-handler 行为特别受限于 multi-threaded 程序,如 the documentation of the signal() function:

中所述

If the process is multi-threaded [...] the behavior is undefined if the signal handler refers to any object other than errno with static storage duration other than by assigning a value to an object declared as volatile sig_atomic_t [...].

因此,无论您使用哪个函数,您的信号处理程序提议的对信号量的访问都会导致程序的行为未定义。您的处理程序可以想象地创建一个本地信号量并使用 async-signal 安全函数对其进行操作,但这不会起到有用的作用。它没有一致的方式来访问范围更广的信号量(或大多数其他对象)。