1. 前言
限于作者能力水平,本文可能存在谬误,因此而给读者带来的损失,作者不做任何承诺。
2. 分析背景
本文基于 ARM32 架构 + Linux 4.14
内核源码进行分析。
3. 信号概述
3.1 信号分类
信号这个概念,起始于 UNIX
操作系统,经过一系列的演变,形成了今天由 POSIX
标准定义的信号。按信号的编号区间和处理的实时性,我们简单的将信号分为标准信号
和实时信号
两类。
3.1.1 标准信号
标准信号起始于 UNIX 操作系统,编号区间为 1-31 。标准信号的编号如下表:
Signal x86/ARM Alpha/ MIPS PARISC Notes
most others SPARC
─────────────────────────────────────────────────────────────────
SIGHUP 1 1 1 1
SIGINT 2 2 2 2
SIGQUIT 3 3 3 3
SIGILL 4 4 4 4
SIGTRAP 5 5 5 5
SIGABRT 6 6 6 6
SIGIOT 6 6 6 6
SIGBUS 7 10 10 10
SIGEMT - 7 7 -
SIGFPE 8 8 8 8
SIGKILL 9 9 9 9
SIGUSR1 10 30 16 16
SIGSEGV 11 11 11 11
SIGUSR2 12 31 17 17
SIGPIPE 13 13 13 13
SIGALRM 14 14 14 14
SIGTERM 15 15 15 15
SIGSTKFLT 16 - - 7
SIGCHLD 17 20 18 18
SIGCLD - - 18 -
SIGCONT 18 19 25 26
SIGSTOP 19 17 23 24
SIGTSTP 20 18 24 25
SIGTTIN 21 21 26 27
SIGTTOU 22 22 27 28
SIGURG 23 16 21 29
SIGXCPU 24 24 30 12
SIGXFSZ 25 25 31 30
SIGVTALRM 26 26 28 20
SIGPROF 27 27 29 21
SIGWINCH 28 28 20 23
SIGIO 29 23 22 22
SIGPOLL Same as SIGIO
SIGPWR 30 29/- 19 19
SIGINFO - 29/- - -
SIGLOST - -/29 - -
SIGSYS 31 12 12 31
SIGUNUSED 31 - - 31
可见,对于不同的硬件架构实现,信号的编号并不相同,但它们需要保证,同名的信号,具有相同的含义。我们再来看一下部分标准信号的具体含义、以及它们的默认处理动作:
Signal Standard Action Comment
────────────────────────────────────────────────────────────────────────
SIGABRT P1990 Core Abort signal from abort(3)
SIGALRM P1990 Term Timer signal from alarm(2)
SIGBUS P2001 Core Bus error (bad memory access)
SIGCHLD P1990 Ign Child stopped or terminated
SIGCLD - Ign A synonym for SIGCHLD
SIGCONT P1990 Cont Continue if stopped
SIGEMT - Term Emulator trap
SIGFPE P1990 Core Floating-point exception
SIGHUP P1990 Term Hangup detected on controlling terminal
or death of controlling process
SIGILL P1990 Core Illegal Instruction
SIGINFO - A synonym for SIGPWR
SIGINT P1990 Term Interrupt from keyboard
SIGIO - Term I/O now possible (4.2BSD)
SIGIOT - Core IOT trap. A synonym for SIGABRT
SIGKILL P1990 Term Kill signal
SIGLOST - Term File lock lost (unused)
SIGPIPE P1990 Term Broken pipe: write to pipe with no
readers; see pipe(7)
SIGPOLL P2001 Term Pollable event (Sys V);
synonym for SIGIO
SIGPROF P2001 Term Profiling timer expired
SIGPWR - Term Power failure (System V)
SIGQUIT P1990 Core Quit from keyboard
SIGSEGV P1990 Core Invalid memory reference
SIGSTKFLT - Term Stack fault on coprocessor (unused)
SIGSTOP P1990 Stop Stop process
SIGTSTP P1990 Stop Stop typed at terminal
SIGSYS P2001 Core Bad system call (SVr4);
see also seccomp(2)
SIGTERM P1990 Term Termination signal
SIGTRAP P2001 Core Trace/breakpoint trap
SIGTTIN P1990 Stop Terminal input for background process
SIGTTOU P1990 Stop Terminal output for background process
SIGUNUSED - Core Synonymous with SIGSYS
SIGURG P2001 Ign Urgent condition on socket (4.2BSD)
SIGUSR1 P1990 Term User-defined signal 1
SIGUSR2 P1990 Term User-defined signal 2
SIGVTALRM P2001 Term Virtual alarm clock (4.2BSD)
SIGXCPU P2001 Core CPU time limit exceeded (4.2BSD);
see setrlimit(2)
SIGXFSZ P2001 Core File size limit exceeded (4.2BSD);
see setrlimit(2)
SIGWINCH - Ign Window resize signal (4.3BSD, Sun)
3.1.2 实时信号
标准信号的处理,不具备实时性。对某一个标准信号,只有当前有挂起的,后续的信号都会被忽略,也就是只会响应第一个信号。为此,引入了实时信号,对于同一实时信号的多次触发,会建立信号队列,将信号入队,让每个信号都得到处理。
实时信号编号区间为 32-64
,glibc 的 pthread ,使用了 32-33
或 32-34
这几个信号,同时将标记实时信号起始编号的宏 SIGRTMIN
重定义为 34 或 35
。
3.2 信号的发起
信号的发起,可以经由系统调用 sys_kill()
或 sys_tgkill()
显式发起,其中:
sys_kill() 发送给线程组,处理信号的线程可以是线程组中内的任何线程;
sys_tgkill() 发送给线程组内特定线程,信号经由该线程上下文处理。
另外一类信号发起的发起的方式,是由进程在某些特定条件下(如空指针访问),由内核隐式发起,如 SIGSEGV
。
在分析信号的发起流程前,先看一下信号进程处理信号相关的数据结构:
接下来看一下信号的发起流程。先看发送信号给线程组的流程:
sys_kill(pid, sig)
struct siginfo info;
info.si_signo = sig;
info.si_errno = 0;
info.si_code = SI_USER;
info.si_pid = task_tgid_vnr(current);
info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
kill_something_info(sig, &info, pid)
kill_pid_info(sig, info, find_vpid(pid))
struct task_struct *p = pid_task(pid, PIDTYPE_PID);
group_send_sig_info(sig, info, p)
do_send_sig_info(sig, info, p, true) /* 发送信号到线程组 */
/* 参看信号发送的公共流程 */
发送信号给特定线程的流程:
sys_tgkill(tgid, pid, sig)
do_tkill(tgid, pid, sig)
struct siginfo info = {};
info.si_signo = sig;
info.si_errno = 0;
info.si_code = SI_TKILL;
info.si_pid = task_tgid_vnr(current);
info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
do_send_specific(tgid, pid, sig, &info)
struct task_struct *p = find_task_by_vpid(pid);
do_send_sig_info(sig, info, p, false) /* 发送信号到特定线程 */
/* 参看信号发送的公共流程 */
发送信号到线程组或线程组内特定线程的公共流程:
do_send_sig_info(sig, info, p, group)
send_signal(sig, info, p, group)
__send_signal(sig, info, t, group, from_ancestor_ns)
/* prepare_signal() 返回 0 表示接收信号 */
if (!prepare_signal(sig, t,
from_ancestor_ns || (info == SEND_SIG_PRIV) || (info == SEND_SIG_FORCED)))
goto ret;
/*
* group == true : 将信号放入线程组共享的挂起队列
* group == false: 将信号放入线程独立的挂起队列
*/
pending = group ? &t->signal->shared_pending : &t->pending;
/* 对标准信号,如果重复收到,仅需要入队1次 */
if (legacy_queue(pending, sig))
goto ret;
/* 分配挂起信号队列节点对象 */
q = __sigqueue_alloc(sig, t, GFP_ATOMIC, override_rlimit);
/* 添加挂起信号到对应队列 */
list_add_tail(&q->list, &pending->list);
copy_siginfo(&q->info, info);
signalfd_notify(t, sig); /* 唤醒在 signalfd 上等待信号的进程 */
sigaddset(&pending->signal, sig); /* 设置挂起信号的掩码 */
/*
* 选择信号处理进程,告知进程有挂起的信号待处理 (设置 TIF_SIGPENDING 标
* 记),然后唤醒进程
*/
complete_signal(sig, t, group);
3.3 信号的处理
3.3.1 信号处理的准备工作
在进程启动时,会做一些进程信号处理的准备工作,其具体流程如下:
load_elf_binary()
...
arch_setup_additional_pages(bprm, !!elf_interpreter)
signal_page = get_signal_page()
page = alloc_pages(GFP_KERNEL, 0); /* 分配1个物理页面 */
addr = page_address(page); /* 返回页面的虚拟地址 */
offset = 0x200 + (get_random_int() & 0x7fc); /* 页面内随机偏移 */
signal_return_offset = offset; /* 保存页内随机偏移到 @signal_return_offset */
/* 拷贝【信号处理接口返回内核空间代码片段】到页面内偏移 @offset 处 */
memcpy(addr + offset, sigreturn_codes, sizeof(sigreturn_codes))
...
/* 映射【信号处理接口返回内核空间代码片段】所在页面到进程虚拟地址空间 */
hint = sigpage_addr(mm, npages);
addr = get_unmapped_area(NULL, hint, npages << PAGE_SHIFT, 0, 0);
vma = _install_special_mapping(mm, addr, PAGE_SIZE,
VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC,
&sigpage_mapping);
/* 记录【信号处理接口返回内核空间代码片段】虚拟地址到进程地址空间 mm_struct */
mm->context.sigpage = addr;
...
3.3.2 信号的处理流程
在中断返回用户空间
或系统调用返回用户空间
时,系统对挂起的信号进行处理。处理流程如下:
/* @arch/arm/kernel/entry-common.S */
ret_fast_syscall:
ldr r1, [tsk, #TI_FLAGS] @ re-check for syscall tracing
tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK @ 检查进程的 _TIF_SIGPENDING 标记
...
/* 有挂起的工作要做,先做完挂起的工作,再返回用户空间 */
slow_work_pending:
mov r0, sp
mov r2, why
@arch/arm/kernel/signal.c
bl do_work_pending @ 处理挂起的信号
do {
...
if (thread_flags & _TIF_SIGPENDING) { /* 挂起信号可能导致系统调用的中断 */
do_signal(regs, syscall)
if (get_signal(&ksig)) { /* 取出一个挂起的信号 */
handle_signal(&ksig, regs); /* 处理取出的挂起信号 */
setup_frame(ksig, oldset, regs)
/* 从用户空间栈分配 sigframe 变量空间 */
struct sigframe __user *frame = get_sigframe(ksig, regs, sizeof(*frame));
setup_sigframe(frame, regs, set)
context = (struct sigcontext) {
.arm_r0 = regs->ARM_r0,
.arm_r1 = regs->ARM_r1,
.arm_r2 = regs->ARM_r2,
.arm_r3 = regs->ARM_r3,
.arm_r4 = regs->ARM_r4,
.arm_r5 = regs->ARM_r5,
.arm_r6 = regs->ARM_r6,
.arm_r7 = regs->ARM_r7,
.arm_r8 = regs->ARM_r8,
.arm_r9 = regs->ARM_r9,
.arm_r10 = regs->ARM_r10,
.arm_fp = regs->ARM_fp,
.arm_ip = regs->ARM_ip,
.arm_sp = regs->ARM_sp,
.arm_lr = regs->ARM_lr,
.arm_pc = regs->ARM_pc,
.arm_cpsr = regs->ARM_cpsr,
.trap_no = current->thread.trap_no,
.error_code = current->thread.error_code,
.fault_address = current->thread.address,
.oldmask = set->sig[0],
};
__copy_to_user(&sf->uc.uc_mcontext, &context, sizeof(context)); /* 保存用户空间上下文:信号处理会破坏它们 */
...
setup_return(regs, ksig, frame->retcode, frame)
/* 用户空间设置的信号处理接口 */
unsigned long handler = (unsigned long)ksig->ka.sa.sa_handler;
...
if (__put_user(sigreturn_codes[idx], rc) ||
__put_user(sigreturn_codes[idx+1], rc+1))
return 1;
/* 进程启动时,映射到进程地址空间的【信号处理接口返回内核空间代码片段】地址 */
struct mm_struct *mm = current->mm;
retcode = mm->context.sigpage + signal_return_offset +
(idx << 2) + thumb;
regs->ARM_r0 = ksig->sig; /* 信号处理接口的 参数0 为信号编码 */
regs->ARM_sp = (unsigned long)frame;
regs->ARM_lr = retcode; /* 信号处理接口返回到sigreturn_codes 代码片段处:即发起系统调用 sys_sigreturn() 返回内核空间,然后再返回用户空间被中断的代码处 */
regs->ARM_pc = handler; /* 处理信号时,返回用户空间时,返回到信号处理接口 */
regs->ARM_cpsr = cpsr;
return 0;
signal_setup_done(ret, ksig, 0)
}
}
...
} while (thread_flags & _TIF_WORK_MASK);
/* 从中断或系统调用返回用户空间,进入用户空间配置的信号处理接口 */
signal_handler()
/*
* 信号处理接口 signal_handler() 返回时,执行 sigreturn_codes 处的代码片段:
* arch/arm/kernel/sigreturn_codes.S
*/
sigreturn_codes:
mov r7, #(__NR_sigreturn - __NR_SYSCALL_BASE)
swi #(__NR_sigreturn)|(__NR_OABI_SYSCALL_BASE)
/* 进入系统调用 sys_sigreturn() */
sys_sigreturn()
frame = (struct sigframe __user *)regs->ARM_sp;
/* 恢复用户空间因信号处理被破坏上下文 */
restore_sigframe(regs, frame)
__copy_from_user(&context, &sf->uc.uc_mcontext, sizeof(context))
regs->ARM_r0 = context.arm_r0;
regs->ARM_r1 = context.arm_r1;
regs->ARM_r2 = context.arm_r2;
regs->ARM_r3 = context.arm_r3;
regs->ARM_r4 = context.arm_r4;
regs->ARM_r5 = context.arm_r5;
regs->ARM_r6 = context.arm_r6;
regs->ARM_r7 = context.arm_r7;
regs->ARM_r8 = context.arm_r8;
regs->ARM_r9 = context.arm_r9;
regs->ARM_r10 = context.arm_r10;
regs->ARM_fp = context.arm_fp;
regs->ARM_ip = context.arm_ip;
regs->ARM_sp = context.arm_sp;
regs->ARM_lr = context.arm_lr;
regs->ARM_pc = context.arm_pc;
regs->ARM_cpsr = context.arm_cpsr;
/* 从系统调用 sys_sigreturn() 返回用户空间继续执行,信号处理完毕!!! */
我们用下图来总结下信号的处理流程:
用户态
signal signal handler 继续执行被中断的程序
---------- ------------------ ----------------------
| | | |
| | sys_sigreturn() |
| | | |
----------V----------------^------------------V--------------^-----------------------> t
| | | |
| | | |
| | | |
---------------- --------------
do_signal() sys_sigreturn()
内核态
4. 实例
学习信号处理的细节,到底意义何在?在Linux应用编程: API基础中,提到一个Async-Signal-Safe Function
的概念,这类函数可以在信号处理函数内调用,除此之外的其它函数,如果在信号处理函数内调用,可能导致程序死锁、或者数据处理混乱等问题。让我们来看一个在信号处理接口内不适当地调用函数,导致死锁的例子:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <pthread.h>
pthread_mutex_t recursive_disallow_mutex;
void async_signal_not_safe(void)
{
pthread_mutex_lock(&recursive_disallow_mutex);
sleep(5);
pthread_mutex_unlock(&recursive_disallow_mutex);
}
void signal_int(int signo)
{
async_signal_not_safe();
}
int main(void)
{
pthread_mutexattr_t attr;
/* 不允许 pthread_mutex_t 递归使用 */
pthread_mutexattr_init(&attr);
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_NORMAL);
pthread_mutex_init(&recursive_disallow_mutex, &attr);
pthread_mutexattr_destroy(&attr);
if (signal(SIGINT, signal_int) == SIG_ERR) {
printf("signal(SIGINT) error");
return -1;
}
async_signal_not_safe();
return 0;
}
编译运行,按下 Ctrl+C ,用 gdb 观察程序运行情况:
bill@bill-virtual-machine:~/Study/app/signal$ sudo gdb attach -p 3560
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
attach: No such file or directory.
Attaching to process 3560
Reading symbols from /home/bill/Study/app/signal/async-signal-not-safe...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug/.build-id/c5/57b8146e8079af46310b549de6912d1fc4ea86.debug...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.23.so...done.
done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.23.so...done.
done.
__lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135 ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) thread apply all bt
Thread 1 (Thread 0x7f6f08429700 (LWP 3560)):
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f6f0800bdbd in __GI___pthread_mutex_lock (mutex=0x6010a0 <recursive_disallow_mutex>) at ../nptl/pthread_mutex_lock.c:80
#2 0x0000000000400904 in async_signal_not_safe ()
#3 0x000000000040092b in signal_int ()
#4 <signal handler called>
#5 0x00007f6f07d04370 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:84
#6 0x00007f6f07d042da in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
#7 0x000000000040090e in async_signal_not_safe ()
#8 0x00000000004009af in main ()
发现程序一直卡在了信号处理函数 signal_int()
调用链:
signal_int()
async_signal_not_safe()
pthread_mutex_lock(&recursive_disallow_mutex)
也就是说,程序发生了死锁。从前面分析的信号处理流程,这里发生问题的场景,在如下场景进入了信号处理接口:
main()
async_signal_not_safe()
pthread_mutex_lock(&recursive_disallow_mutex)
sleep(5)
sleep() 使进程陷入睡眠期间,按下 Ctrl+C
生成了 SIGINT
信号;在 sleep()
睡眠时间到达后,系统唤醒进程,从 sleep()
系统调用返回用户空间,发现进程有挂起的信号,于是进入信号处理流程:
signal_int()
async_signal_not_safe()
pthread_mutex_lock(&recursive_disallow_mutex)
此时因为锁 recursive_disallow_mutex
尚未释放,同时禁用了锁 recursive_disallow_mutex
的递归使用,从而导致死锁。
类似如上的场景还有很多,如在 main()
和 signal_int()
中都同时调用 malloc()/free()
等接口,都会导致死锁,或者数据损坏等莫名其妙的错误。
5. 参考资料
man signal
关于异步信号安全