《Linux 内核设计与实现》03. 进程管理

文章目录

- 进程描述符及任务结构
- - 分配进程描述符
  - 进程描述符的存放
  - 进程状态
  - 设置当前进程状态
  - 进程上下文
  - 进程家族树
- 进程创建
- 线程在 Linux 中的实现
- - 创建线程
  - 内核线程
- 进程终结
- - - 删除进程描述符
    - 孤儿进程

进程描述符及任务结构

内核把进程存放在任务队列（task list）中，该队列由双向循环链表实现。

链表中的每个元素都是 task_struct 类型，也称为进程描述符。

// include/linux/sched.h
struct task_struct {
    unsigned long state;
    int prio;
    unsigned long policy;
    struct task_struct *parent;
    struct list_head tasks;
    pid_t pid;
    ...
}

进程的另一个名字是任务（task）。

分配进程描述符

Linux 通过 slab 分配器分配 task_struct 结构，这样能达到对象复用和缓存着色的目的。

各个进程的 task_struct 存放在它们内核栈的尾端。目的是为了让像 x86 那样寄存器少的硬件体系结构只需要通过栈指针就能够计算出某个进程的位置，从而避免使用额外的寄存器专门记录。

对于栈是向下增长的来说，就将 thread_info 放在栈底，而对于栈是向上增长的来说，就放在栈顶。

关于 thread_info 结构：

// asm/thread.info.h
struct thread_info {
	struct pcb_struct	pcb;		/* palcode state */

	struct task_struct	*task;		/* main task structure */
	unsigned int		flags;		/* low level flags */
	unsigned int		ieee_state;	/* see fpu.h */

	struct exec_domain	*exec_domain;	/* execution domain */
	mm_segment_t		addr_limit;	   /* thread address space */
	unsigned		cpu;		/* current CPU */
	int			preempt_count; /* 0 => preemptable, <0 => BUG */

	int bpt_nsaved;
	unsigned long bpt_addr[2];		/* breakpoint handling  */
	unsigned int bpt_insn[2];

	struct restart_block	restart_block;
};

进程描述符的存放

PID 是内核用来标识一个进程的唯一方式。PID 是一个整数，为了与老版本兼容，它最大值默认被设置为 32768。

PID 最大值可以通过 /proc/sys/kernel/pid_max 来修改上限。

PID 被内核放在了进程描述符（task_struct）中。

在内核中，访问任务需要获得指向其 task_struct 的指针。所以如果内核要执行某个任务，就必须先得到指向其 task_struct 的指针。Linux 中通过 current 宏来实现，硬件体系结构不一样，该宏实现的方式也不同：

若硬件体系结构的寄存器足够，那么就可以将 task_struct 指针直接存储到该寄存器。
若硬件体系结构的寄存器有限，那么就可以在内核栈的尾端创建一个 thread_info 结构，通过计算偏移间接查找 task_struct 结构。

current 宏在 x86 中实现如下：

arch/alpha/include/asm/current.h：

#ifndef _ALPHA_CURRENT_H
#define _ALPHA_CURRENT_H

#include <linux/thread_info.h>

// 通过 current_thread_info() 得到当前进程的指针后直接访问其对应的任务进程(task_struct)指针 task
#define get_current()	(current_thread_info()->task)
#define current		get_current()

#endif /* _ALPHA_CURRENT_H */

arch/alpha/include/asm/thread_info.h：

// 通知编译器将指向 thread_info 类型的指针存放到寄存器 $8 中
register struct thread_info *__current_thread_info __asm__("$8");
// 获取当前进程信息（thread_info）指针
#define current_thread_info()  __current_thread_info

进程状态

位于进程描述符中的 state 域中，每个进程无论何时都有其中一种状态。

TASK_RUNNING
TASK_INTERRUPTIBLE
TASK_UNINTERRUPTIBLE
__TASK_TRACED
__TASK_STOPPED

设置当前进程状态

set_current_state(state)：将当前进程设置为 state 状态。
set_task_state(tsk, state)：将 tsk 进程设置为 state 状态。

#define __set_task_state(tsk, state_value) 	do { (tsk)->state = (state_value); } while (0)
#define set_task_state(tsk, state_value) 	set_mb((tsk)->state, (state_value))

/*
 * set_current_state() includes a barrier so that the write of current->state
 * is correctly serialised wrt the caller's subsequent test of whether to
 * actually sleep:
 *
 *	set_current_state(TASK_UNINTERRUPTIBLE);
 *	if (do_i_need_to_sleep())
 *		schedule();
 *
 * If the caller does not need such serialisation then use __set_current_state()
 */
#define __set_current_state(state_value) do { current->state = (state_value); } while (0)
#define set_current_state(state_value) set_mb(current->state, (state_value))

进程上下文

例如一个在用户空间中运行的程序（可执行程序代码是进程的主要组成部分，这些代码从一个可执行文件载入到进程的地址空间执行），一般程序都是在用户空间中执行。当一个程序在运行过程中调用了系统调用或者触发了某个异常，此时它就会陷入内核空间。此时内核需要代替用户程序执行用户程序所需的程序（其实就是说，用户程序没权限去得到或执行内核的东西，但是我用户程序需要内核的部分东西来辅助用户程序的执行，所以我需要内核来帮我去执行某些程序，最后将结果给我），这便是“代表进程执行”。

用户程序陷入内核后，后面代码不会执行，而是先去内核执行对应的程序，此时用户程序的执行环境便是上下文。

进程家族树

Unix 和 Linux 的进程之间都存在一个明显的继承关系，所有的进程都是 PID 为 1 的 init 进程的后代。待内核一切准备就绪后，便会执行 init 进程来初始化系统所需的资源。

系统中每个进程必有一个父进程（init），而每个进程可以有零个或多个子进程。

struct task_struct {
    struct task_struct *parent;   // 父进程
    struct task_struct *children; // 子进程
};

获取和遍历进程的方式：include/linux/list.h

 /**
 * list_entry - get the struct for this entry
 * @ptr:	the &struct list_head pointer.
 * @type:	the type of the struct this is embedded in.
 * @member:	the name of the list_struct within the struct.
 */
#define list_entry(ptr, type, member) \
	container_of(ptr, type, member)

/**
 * list_for_each	-	iterate over a list
 * @pos:	the &struct list_head to use as a loop cursor.
 * @head:	the head for your list.
 */
#define list_for_each(pos, head) \
	for (pos = (head)->next; prefetch(pos->next), pos != (head); \
        	pos = pos->next)

如果在一个拥有大量进程的系统中遍历所有进程，代价是很大的，因此尽量不要这么做。

进程创建

其它操作系统都提供了产生进程的机制，共两步：

在新的地址空间中创建进程，读入可执行文件。
开始执行可执行文件。

Unix 将上面两个步骤分别封装到了 fork() 和 exec() ，这两个函数组合起来使用便和其它操作系统使用单一函数创建进程一样：

fork()：通过拷贝当前进程生成一个子进程。
exec()：负责读取可执行文件并将其载入地址空间开始运行。

Linux 的 fork() 使用写时拷贝页实现。

Linux 通过 clone() 系统调用实现 fork()。

线程在 Linux 中的实现

Linux 把所有线程都当进程来实现。线程仅仅被视为一个与其它进程共享某些资源的进程。线程也有自己的 task_struct 只不过它们都共享父进程的地址空间，也就是说它们没有自己的地址空间。在其它操作系统中，线程被称为“轻量级进程”，可在 Linux 中进程本就够轻量了。

创建线程

和创建进程一样，需要调用 clone() 系统调用来实现，不过需要传递一些参数标志来指明需要共享的资源：

clone(CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND, 0);

传递给 clone() 的参数标志决定了新创建进程的行为方式和父子进程之间共享的资源种类。

内核线程

内核进程需要在后台执行一些操作。这种任务可以通过内核线程完成 —— 独立运行在内核空间的标准进程。内核线程和普通的进程间的区别在于内核线程没有独立的地址空间（即 task_struct 中指向地址空间的指针 mm 为 NULL）。它们只在内核空间运行，不会跨越到用户态。

内核线程只能通过其它内核线程来创建。

从现有内核线程中创建一个新的内核线程的方法在 linux/kthread.h 中的 kthread_create()。

进程终结

当一个进程终结时，内核必须释放它所占有的所有资源，并通过父进程。

终结通过 exit() 系统调用来实现，具体实现靠 do_exit()，位于 kernel/exit.c 中。

此时只是释放所占为的内存资源，即对于内存资源位图需要重置。

删除进程描述符

调用完 exit() 后，对应的内存位图被重置，但是此时并没有将进程描述符 task_struct 以及 thread_info 给删除掉，由此可见资源的释放和进程描述符的删除是被分开执行的。

这样做是为了当线程僵尸后，可以得到该线程的信息，以便于通知父进程。当父进程得到了子进程以及死亡的消息后，在来删除进程描述符和 thread_info。

孤儿进程

如果父进程在子进程退出之前就先退出了，必须要有机制保证子进程能找到一个新的父进程，否则这些孤儿进程就会在退出时永远处于僵死状态，白白的消耗内存。

解决方案是给子进程在当前进程组内找一个线程作为父进程，实在不行，就让 init 作为它们的父进程。