如何解决这是POSIX兼容的实现,用于在多线程程序中处理SIGFPE,SIGSEGV等信号吗?
我正在开发一个需要处理崩溃信号的程序。 “崩溃信号”是指“由于硬件异常而传递的信号” [1],例如SIGFPE
和SIGSEGV
。我还没有找到描述此信号类别的具体名称,所以我想出这个名称的目的是为了清楚和减少冗长。
根据我的研究,捕获这些信号是很痛苦的。崩溃信号处理程序不能返回,否则行为是不确定的[2] [3]。具有不确定的行为意味着一种实现可能会杀死进程或重新引发信号,从而使程序陷入无限循环,这是不希望的。
另一方面,通常在信号处理程序内部几乎没有自由,尤其是在多线程程序中:在信号处理程序中调用的函数必须既是线程安全的又是异步信号安全的[4]。例如,您不能调用malloc()
,因为它不是异步信号安全的,也不能调用依赖它的其他函数。特别是,当我使用C ++时,我无法对GCC的abi::__cxa_demangle()
进行安全调用以生成不错的堆栈跟踪,因为它在内部使用了malloc()
。虽然我可以使用Chromium的库 symbolize [5]来进行异步信号安全和线程安全的C ++符号名称的拆解,但我不能使用dladdr()
来获取更多信息,这是因为未指定异步信号安全。
处理通用信号的另一种方法是在工作线程中使用sigprocmask()
(或在多线程程序中使用pthread_sigmask()
)将其阻塞,并在该线程中调用sigwait()
。这适用于非崩溃信号,例如SIGINT
和SIGTERM
。但是,“如果SIGFPE
,SIGILL
,SIGSEGV
或SIGBUS
中的任何一个信号在被阻塞时生成,则结果不确定” [6] ,所有投注均关闭。
跳过信号安全手册[4]的手册页,我发现sem_post()
是异步信号安全的(当然也是线程安全的),并围绕它实现了一个解决方案,类似于sigwait()
方法。这个想法是产生一个信号处理线程,该线程用pthread_sigmask()
阻塞信号并调用sem_wait()
。还定义了崩溃信号处理程序,以便每当发出崩溃信号时,该处理程序都将信号设置为全局范围的变量,调用sem_post()
,并等待直到信号处理线程完成处理并退出程序。 / p>
请注意,为简单起见,以下实现不检查syscalls的返回值。
// Std
#include <atomic>
#include <cstdlib>
#include <ctime>
#include <iostream>
#include <thread>
// System
#include <semaphore.h>
#include <signal.h>
#include <unistd.h>
// NOTE: C++20 exempts it from `ATOMIC_FLAG_INIT`
std::atomic_flag caught_signal = ATOMIC_FLAG_INIT;
int crash_sig = 0;
sem_t start_semaphore;
sem_t signal_semaphore;
extern "C" void crash_signal_handler(int sig)
{
// If two or more threads evaluate this condition at the same time,// one of them shall enter the if-branch and the rest will skip it.
if (caught_signal.test_and_set(std::memory_order_relaxed) == false)
{
// `crash_sig` needs not be atomic since only this thread and
// the signal processing thread use it,and the latter is
// `sem_wait()`ing.
crash_sig = sig;
sem_post(&signal_semaphore);
}
// It is undefined behavior if a signal handler returns from a crash signal.
// Implementations may re-raise the signal infinitely,kill the process,or whatnot,// but we want the crash signal processing thread to try handling the signal first;
// so don't return.
//
// NOTE: maybe one could use `pselect()` here as it is async-signal-safe and seems to
// be thread-safe as well. `sleep()` is async-signal-safe but not thread-safe.
while (true)
;
const char msg[] = "Panic: compiler optimized out infinite loop in signal handler\n";
write(STDERR_FILENO,msg,sizeof(msg));
std::_Exit(EXIT_FAILURE);
}
void block_crash_signals()
{
sigset_t set;
sigemptyset(&set);
sigaddset(&set,SIGSEGV);
sigaddset(&set,SIGFPE);
pthread_sigmask(SIG_BLOCK,&set,nullptr);
}
void install_signal_handler()
{
// NOTE: one may set an alternate stack here.
struct sigaction sig;
sig.sa_handler = crash_signal_handler;
sig.sa_flags = 0;
::sigaction(SIGSEGV,&sig,nullptr);
::sigaction(SIGFPE,nullptr);
}
void restore_signal_handler()
{
struct sigaction sig;
sig.sa_handler = SIG_DFL;
sig.sa_flags = 0;
::sigaction(SIGSEGV,nullptr);
}
void process_crash_signal()
{
// If a crash signal occurs,the kernel will invoke `crash_signal_handler` in
// any thread which may be not this current one.
block_crash_signals();
install_signal_handler();
// Tell main thread it's good to go.
sem_post(&start_semaphore);
// Wait for a crash signal.
sem_wait(&signal_semaphore);
// Got a signal.
//
// We're not in kernel space,so we are "safe" to do anything from this thread,// such as writing to `std::cout`. HOWEVER,operations performed by this function,// such as calling `std::cout`,may raise another signal. Or the program may be in
// a state where the damage was so severe that calling any function will crash the
// program. If that happens,there's not much what we can do: this very signal
// processing function is broken,so let the kernel invoke the default signal
// handler instead.
restore_signal_handler();
const char* signame;
switch (crash_sig)
{
case SIGSEGV: signame = "SIGSEGV"; break;
case SIGFPE: signame = "SIGFPE"; break;
default: signame = "weird,this signal should not be raised";
}
std::cout << "Caught signal: " << crash_sig << " (" << signame << ")\n";
// Uncomment these lines to invoke `SIG_DFL`.
// volatile int zero = 0;
// int a = 1 / zero;
std::cout << "Sleeping for 2 seconds to prove that other threads are waiting for me to finish :)\n";
std::this_thread::sleep_for(std::chrono::seconds{ 2 });
std::cout << "Alright,I appreciate your patience <3\n";
std::exit(EXIT_FAILURE);
}
void divide_by_zero()
{
volatile int zero = 0;
int oops = 1 / zero;
}
void access_invalid_memory()
{
volatile int* p = reinterpret_cast<int*>(0xdeadbeef); // dw,I know what I'm doing lmao
int oops = *p;
}
int main()
{
// TODO: maybe use the pthread library API instead of `std::thread`.
std::thread worker{ process_crash_signal };
// Wait until `worker` has started.
sem_wait(&start_semaphore);
std::srand(static_cast<unsigned>(std::time(nullptr)));
while (true)
{
std::cout << "Odds are the program will crash...\n";
switch (std::rand() % 3)
{
case 0:
std::cout << "\nCalling divide_by_zero()\n";
divide_by_zero();
std::cout << "Panic: divide_by_zero() returned!\n";
return 1;
case 1:
std::cout << "\nCalling access_invalid_memory()\n";
access_invalid_memory();
std::cout << "Panic: access_invalid_memory() returned!\n";
return 1;
default:
std::cout << "...not this time,apparently\n\n";
continue;
}
}
return 0;
}
使用
进行编译$ g++ --version
g++ (Debian 9.2.1-22) 9.2.1 20200104
$ g++ -pthread -o handle_crash_signal handle_crash_signal.cpp
收益
$ ./handle_crash_signal
Odds are the program will crash...
Calling access_invalid_memory()
Caught signal: 11 (SIGSEGV)
Sleeping for 2 seconds to prove that other threads are waiting for me to finish :)
Alright,I appreciate your patience <3
[1] https://man7.org/linux/man-pages/man7/signal.7.html
[2] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1318.htm
[3] Returning From Catching A Floating Point Exception
[4] https://man7.org/linux/man-pages/man7/signal-safety.7.html
[5] https://chromium.googlesource.com/chromium/src/base/+/master/third_party/symbolize
[6] https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigprocmask.html
相关主题:Catching signals such as SIGSEGV and SIGFPE in multithreaded program
解决方法
否,它不符合POSIX。定义的信号处理程序行为特别受多线程程序的限制,如the documentation of the signal()
function中所述:
如果进程是多线程[...],则行为不确定 信号处理程序引用
errno
以外的任何对象 静态存储期限,而不是通过为对象分配值 声明为volatile sig_atomic_t
[...]。
因此,无论您使用哪种功能,信号处理程序对信号量的建议访问都将导致程序的行为未定义。可以想象,您的处理程序可以创建一个本地信号量,并使用异步信号安全功能对其进行操作,但这将无济于事。它没有一致的方法来访问范围更广的信号量(或大多数其他任何对象)。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。