【linux】【操作系统】内核之sched.c源码阅读

在这里插入图片描述

sched.c提供的代码片段包含了与操作系统内核中的进程调度和管理相关的多个函数。schedule函数首先对所有任务（进程）进行检测，唤醒任何一个已经得到信号的任务。具体方法是针对任务数组中的每个任务，检查其报警定时值alam。如果任务的alam时间已经过期(alarm≤jiffies),则在它的信号位图中设置SIGALRM信号，然后清alam值。jiffies是系统从开机开始算起的滴答数(IOms滴答)。sched..h中定义。如果进程的信号位图中除去被阻塞的信号外还有其它信号，并且任务处于可中断睡眠状态(TASK INTERRUPTIBLE),则置任务为就绪状态(TASK RUNNING)

`schedule()`

void schedule(void)
{
    int i, next, c;
    struct task_struct ** p;

    /* 检查报警，唤醒任何已接收信号的可中断任务 */
    for (p = &LAST_TASK; p > &FIRST_TASK; --p)
        if (*p) {
            if ((*p)->alarm && (*p)->alarm < jiffies) {
                (*p)->signal |= (1 << (SIGALRM - 1));
                (*p)->alarm = 0;
            }
            if (((*p)->signal & ~(_BLOCKABLE & (*p)->blocked)) &&
                (*p)->state == TASK_INTERRUPTIBLE)
                (*p)->state = TASK_RUNNING;
        }

    /* 这是调度器的核心部分： */

    while (1) {
        c = -1;
        next = 0;
        i = NR_TASKS;
        p = &task[NR_TASKS];
        while (--i) {
            if (!*--p)
                continue;
            if ((*p)->state == TASK_RUNNING && (*p)->counter > c)
                c = (*p)->counter, next = i;
        }
        if (c) break;
        for (p = &LAST_TASK; p > &FIRST_TASK; --p)
            if (*p)
                (*p)->counter = ((*p)->counter >> 1) + (*p)->priority;
    }
    switch_to(next);
}

目的：此函数实现了核心调度逻辑，根据当前进程的状态和优先级选择下一个要运行的进程。
过程：
1. 信号处理：检查是否有需要因信号或定时器而唤醒的进程。
2. 调度逻辑：
- 通过遍历所有进程来查找最高优先级的可运行进程。
- 如果找不到可运行的进程，则减少所有进程的计数器，使它们再次变为可运行状态。
详细解释 schedule 函数
检查报警并唤醒可中断的任务

检查报警：
- 循环反向遍历所有任务 (struct task_struct 指针)。
- 如果一个任务设置了报警，并且当前时间 (jiffies) 大于报警时间，则向该任务发送报警信号 (SIGALRM)。
- 报警时间随后重置为 0。
唤醒可中断的任务：
- 如果一个任务在可中断睡眠状态 (TASK_INTERRUPTIBLE) 下接收到信号，则通过将其状态设置为 TASK_RUNNING 来唤醒它。

调度逻辑

查找下一个要运行的任务：
- 循环搜索具有最高优先级的可运行任务。优先级由任务结构中的 counter 字段决定。
- 循环遍历所有任务，检查它们是否处于 TASK_RUNNING 状态并且 counter 值高于当前最高优先级任务。
- 最高优先级任务的索引存储在 next 中。
退出循环：
- 如果找到具有非零 counter 的任务，循环结束，下一步是切换到该任务。
- 如果没有找到这样的任务，则继续执行下一步。
调整任务优先级：
- 如果没有找到具有非零 counter 的任务，循环将调整所有任务的优先级。
- 每个任务的 counter 被减半，然后加上任务的 priority。
- 这样可以确保较低优先级的任务最终有机会运行。
切换到下一个任务：
- 一旦找到合适任务或调整优先级后，函数调用 switch_to(next) 切换到选定的任务。

总结

此函数执行两项主要任务：

它检查报警并唤醒任何已接收信号的可中断任务。
它根据优先级选择下一个要运行的任务，并使用 switch_to 函数切换到该任务。

`sys_pause()`

目的：暂停当前进程直到接收到信号。
过程：将当前进程的状态设置为 TASK_INTERRUPTIBLE 并调用 schedule() 来释放 CPU。

`sleep_on()`

sleep_on(函数的主要功能是当一个进程（或任务）所请求的资源正忙或不在内存中时暂时切换出去，放在等待队列中等待一段时间。当切换回来后再继续运行。放入等待队列的方式是利用了函数中的t即指针作为各个正在等待任务的联系。

目的：将当前进程置于不可中断的睡眠状态。
过程：
- 将当前进程的状态保存到指定指针中。
- 将当前进程的状态设置为 TASK_UNINTERRUPTIBLE。
- 调用 schedule() 来释放 CPU。
- 恢复原始进程的状态。

此函数用于改变当前进程的状态至不可中断睡眠状态，并进行进程调度。具体步骤如下：

检查参数：首先检查传入的指针 p 是否为空。若为空，则直接返回，不做任何操作。
检查当前进程：判断当前进程是否为系统初始化进程 init_task。如果是，则会触发一个 panic 异常，因为初始化进程不应该进入睡眠状态。
进程状态交换：使用临时变量 tmp 保存 *p 的值，然后将 *p 设置为当前进程的结构体指针 current。这意味着原本 *p 指向的进程信息被保存在 tmp 中，而 *p 现在指向了当前进程。
设置当前进程状态：将当前进程的状态设置为 TASK_UNINTERRUPTIBLE，表示当前进程进入不可中断的睡眠状态。在此状态下，进程不会响应信号，直到等待的条件满足。
调度器调用：通过调用 schedule() 函数，使其他就绪进程有机会运行。这会导致当前进程挂起，控制权交给其他进程。
恢复原进程状态：最后，如果 tmp 不为空（即原本 *p 指向的进程存在），则将其状态设为0。这通常意味着恢复原进程的状态为初始状态或某种未指定状态，具体取决于上下文。

总结来说，这个函数主要用于让当前进程进入不可中断的睡眠状态，并允许其他进程运行。

`interruptible_sleep_on()`

目的：将当前进程置于可中断的睡眠状态。
过程：
- 类似于 sleep_on()，但进程可以被信号中断。
- 不断检查是否应唤醒进程，并重复这一过程直到满足条件为止。

`wake_up()`

目的：唤醒正在睡眠的进程。
过程：
- 将由 p 指向的进程的状态设置为 0，表示该进程已准备好运行。
- 清除指针以指示进程已被唤醒。

`ticks_to_floppy_on()`

目的：计算打开软盘驱动器电机之前的延迟时间。
过程：
- 根据当前软盘驱动器的状态和所需的驱动器编号确定延迟时间。
- 使用 sleep_on() 函数等待指定的延迟时间。

`floppy_on()`

目的：打开指定的软盘驱动器。
过程：调用 ticks_to_floppy_on() 计算延迟时间并使用 sleep_on() 等待。

`floppy_off()`

目的：关闭指定的软盘驱动器。
过程：设置一个定时器，在一定延迟后关闭驱动器。

`do_floppy_timer()`

目的：处理软盘驱动器的定时操作。
过程：
- 遍历四个软盘驱动器并根据定时器更新其状态。

`add_timer()`

目的：将定时事件添加到定时器列表中。
过程：
- 如果定时器持续时间为非正数，则立即执行回调函数。
- 否则，将定时器添加到列表中，并按定时器到期时间对列表进行排序。

`do_timer()`

目的：处理系统定时器中断。
过程：
- 更新当前进程的用户时间和系统时间。
- 处理定时器列表，执行任何已过期的定时器。
- 处理软盘驱动器定时。
- 如果当前进程的计数器达到零，则调度另一个进程。

`sys_alarm()`

目的：为当前进程设置或获取报警定时器。
过程：
- 如果提供了新的值，则设置报警定时器。
- 返回先前的报警值（如果有的话）。

`sys_getpid()`, `sys_getppid()`, `sys_getuid()`, `sys_geteuid()`, `sys_getgid()`, `sys_getegid()`

目的：检索当前进程的各种属性。
过程：
- 每个函数返回当前进程的一个特定属性（PID、PPID、UID、EUID、GID、EGID）。

`sys_nice()`

目的：调整当前进程的优先级。
过程：
- 如果可能，根据指定的增量降低当前进程的优先级。

`sched_init()`

sched_init 函数是 Linux 内核启动过程中的一部分，用于初始化调度器相关的数据结构和设置。下面是该函数的详细解释：

初始化描述符表

检查 sigaction 结构大小:
- 确保 sigaction 结构的大小为 16 字节。如果不满足这一条件，将引发 panic 异常。
设置 TSS 和 LDT 描述符:
- 使用 set_tss_desc 函数设置全局描述符表 (GDT) 中的第一个 TSS (Task State Segment) 描述符，指向初始化任务 (init_task) 的 TSS。
- 使用 set_ldt_desc 函数设置 GDT 中的第一个 LDT (Local Descriptor Table) 描述符，指向初始化任务的 LDT。
初始化任务描述符表:
- 从 GDT 中第一个TSS描述符之后的位置开始，循环初始化每个任务的描述符。
- 对于每个任务，将对应的描述符字段 a 和 b 清零。

设置中断和系统调用门

清除 NT 标志:
- 使用汇编指令清除标志寄存器中的 NT 标志位。这有助于避免后续可能出现的问题。
加载 TSS 和 LDT:
- 使用 ltr 指令加载空的 TSS 段选择符。
- 使用 lldt 指令加载空的 LDT 段选择符。
配置 8259A PIC:
- 设置 8259A PIC 的操作模式为二进制计数，模式 3，LSB/MSB，通道 0。
- 设置 8259A PIC 的中断屏蔽寄存器以允许 IRQ0 中断。
设置中断门和系统调用门:
- 使用 set_intr_gate 函数设置 IRQ0 的中断门，指向 timer_interrupt 函数。
- 使用 set_system_gate 函数设置系统调用门，指向 system_call 函数。
禁用 NMI:
- 设置 8259A PIC 的中断屏蔽寄存器以禁止 NMI (不可屏蔽中断)。

GDT 全局描述符

全局描述符表 (Global Descriptor Table, GDT) 是 x86 架构中保护模式下的一个重要的数据结构，用于管理处理器中的段选择器和描述符。
在这里插入图片描述

在这里插入图片描述

GDT 的数据结构

GDT 描述符格式
GDT 中的每个描述符通常由 8 字节（64 位）组成，其中包含以下字段：

Limit (界限):
- 低 20 位表示段的最大长度，用于限制段的大小。
- 第 21 位 (G) 表示界限是否使用 32 位形式（G=1）还是 16 位形式（G=0）。
Base (基地址):
- 32 位或 48 位，表示段的起始物理地址。
- 在 32 位系统中，基地址占用 32 位；在 64 位系统中，可以使用扩展的 48 位基地址。
Access byte (访问权限):
- 8 位，包含了描述符的访问权限，如可读、可写、可执行、特权级别等。
Flags (标志):
- 12 位，包含了其他标志，如 DPL (Descriptor Privilege Level)、P (Present)、D (Direction)、AVL (Available) 等。

GDT 的作用

段描述符管理:
- GDT 存储了一系列的段描述符，每个描述符描述了一个内存段（例如代码段、数据段等）的属性，如基地址、界限、访问权限等。
段选择器绑定:
- CPU 中的段寄存器（如 CS、DS、ES、SS 等）被设置为指向 GDT 中特定描述符的选择器，从而决定了程序对内存段的访问方式。
访问控制:
- GDT 中的描述符包含了访问权限信息，如可读、可写、可执行等，以及特权级别（ring），用于实现对内存的访问控制。
虚拟地址转换:
- GDT 描述符中的基地址用于构建线性地址，进而通过页表进行虚拟地址到物理地址的转换。

总结

这些函数共同管理进程的调度和执行、处理信号和报警、管理进程属性以及控制软盘驱动器的操作。schedule() 函数是调度器的核心，它根据进程的状态和优先级决定下一个运行的进程。其他函数支持进程管理的不同方面以及硬件控制。

源码

#include <linux/sched.h>
#include <linux/kernel.h>
#include <linux/sys.h>
#include <linux/fdreg.h>
#include <asm/system.h>
#include <asm/io.h>
#include <asm/segment.h>

#include <signal.h>

#define _S(nr) (1<<((nr)-1))
#define _BLOCKABLE (~(_S(SIGKILL) | _S(SIGSTOP)))

void show_task(int nr,struct task_struct * p)
{
	int i,j = 4096-sizeof(struct task_struct);

	printk("%d: pid=%d, state=%d, ",nr,p->pid,p->state);
	i=0;
	while (i<j && !((char *)(p+1))[i])
		i++;
	printk("%d (of %d) chars free in kernel stack\n\r",i,j);
}

void show_stat(void)
{
	int i;

	for (i=0;i<NR_TASKS;i++)
		if (task[i])
			show_task(i,task[i]);
}

#define LATCH (1193180/HZ)

extern void mem_use(void);

extern int timer_interrupt(void);
extern int system_call(void);

union task_union {
	struct task_struct task;
	char stack[PAGE_SIZE];
};

static union task_union init_task = {INIT_TASK,};

long volatile jiffies=0;
long startup_time=0;
struct task_struct *current = &(init_task.task);
struct task_struct *last_task_used_math = NULL;

struct task_struct * task[NR_TASKS] = {&(init_task.task), };

long user_stack [ PAGE_SIZE>>2 ] ;

struct {
	long * a;
	short b;
	} stack_start = { & user_stack [PAGE_SIZE>>2] , 0x10 };
/*
 *  'math_state_restore()' saves the current math information in the
 * old math state array, and gets the new ones from the current task
 */
void math_state_restore()
{
	if (last_task_used_math == current)
		return;
	__asm__("fwait");
	if (last_task_used_math) {
		__asm__("fnsave %0"::"m" (last_task_used_math->tss.i387));
	}
	last_task_used_math=current;
	if (current->used_math) {
		__asm__("frstor %0"::"m" (current->tss.i387));
	} else {
		__asm__("fninit"::);
		current->used_math=1;
	}
}

/*
 *  'schedule()' is the scheduler function. This is GOOD CODE! There
 * probably won't be any reason to change this, as it should work well
 * in all circumstances (ie gives IO-bound processes good response etc).
 * The one thing you might take a look at is the signal-handler code here.
 *
 *   NOTE!!  Task 0 is the 'idle' task, which gets called when no other
 * tasks can run. It can not be killed, and it cannot sleep. The 'state'
 * information in task[0] is never used.
 */
void schedule(void)
{
	int i,next,c;
	struct task_struct ** p;

/* check alarm, wake up any interruptible tasks that have got a signal */

	for(p = &LAST_TASK ; p > &FIRST_TASK ; --p)
		if (*p) {
			if ((*p)->alarm && (*p)->alarm < jiffies) {
					(*p)->signal |= (1<<(SIGALRM-1));
					(*p)->alarm = 0;
				}
			if (((*p)->signal & ~(_BLOCKABLE & (*p)->blocked)) &&
			(*p)->state==TASK_INTERRUPTIBLE)
				(*p)->state=TASK_RUNNING;
		}

/* this is the scheduler proper: */

	while (1) {
		c = -1;
		next = 0;
		i = NR_TASKS;
		p = &task[NR_TASKS];
		while (--i) {
			if (!*--p)
				continue;
			if ((*p)->state == TASK_RUNNING && (*p)->counter > c)
				c = (*p)->counter, next = i;
		}
		if (c) break;
		for(p = &LAST_TASK ; p > &FIRST_TASK ; --p)
			if (*p)
				(*p)->counter = ((*p)->counter >> 1) +
						(*p)->priority;
	}
	switch_to(next);
}

int sys_pause(void)
{
	current->state = TASK_INTERRUPTIBLE;
	schedule();
	return 0;
}

void sleep_on(struct task_struct **p)
{
	struct task_struct *tmp;

	if (!p)
		return;
	if (current == &(init_task.task))
		panic("task[0] trying to sleep");
	tmp = *p;
	*p = current;
	current->state = TASK_UNINTERRUPTIBLE;
	schedule();
	if (tmp)
		tmp->state=0;
}

void interruptible_sleep_on(struct task_struct **p)
{
	struct task_struct *tmp;

	if (!p)
		return;
	if (current == &(init_task.task))
		panic("task[0] trying to sleep");
	tmp=*p;
	*p=current;
repeat:	current->state = TASK_INTERRUPTIBLE;
	schedule();
	if (*p && *p != current) {
		(**p).state=0;
		goto repeat;
	}
	*p=NULL;
	if (tmp)
		tmp->state=0;
}

void wake_up(struct task_struct **p)
{
	if (p && *p) {
		(**p).state=0;
		*p=NULL;
	}
}

/*
 * OK, here are some floppy things that shouldn't be in the kernel
 * proper. They are here because the floppy needs a timer, and this
 * was the easiest way of doing it.
 */
static struct task_struct * wait_motor[4] = {NULL,NULL,NULL,NULL};
static int  mon_timer[4]={0,0,0,0};
static int moff_timer[4]={0,0,0,0};
unsigned char current_DOR = 0x0C;

int ticks_to_floppy_on(unsigned int nr)
{
	extern unsigned char selected;
	unsigned char mask = 0x10 << nr;

	if (nr>3)
		panic("floppy_on: nr>3");
	moff_timer[nr]=10000;		/* 100 s = very big :-) */
	cli();				/* use floppy_off to turn it off */
	mask |= current_DOR;
	if (!selected) {
		mask &= 0xFC;
		mask |= nr;
	}
	if (mask != current_DOR) {
		outb(mask,FD_DOR);
		if ((mask ^ current_DOR) & 0xf0)
			mon_timer[nr] = HZ/2;
		else if (mon_timer[nr] < 2)
			mon_timer[nr] = 2;
		current_DOR = mask;
	}
	sti();
	return mon_timer[nr];
}

void floppy_on(unsigned int nr)
{
	cli();
	while (ticks_to_floppy_on(nr))
		sleep_on(nr+wait_motor);
	sti();
}

void floppy_off(unsigned int nr)
{
	moff_timer[nr]=3*HZ;
}

void do_floppy_timer(void)
{
	int i;
	unsigned char mask = 0x10;

	for (i=0 ; i<4 ; i++,mask <<= 1) {
		if (!(mask & current_DOR))
			continue;
		if (mon_timer[i]) {
			if (!--mon_timer[i])
				wake_up(i+wait_motor);
		} else if (!moff_timer[i]) {
			current_DOR &= ~mask;
			outb(current_DOR,FD_DOR);
		} else
			moff_timer[i]--;
	}
}

#define TIME_REQUESTS 64

static struct timer_list {
	long jiffies;
	void (*fn)();
	struct timer_list * next;
} timer_list[TIME_REQUESTS], * next_timer = NULL;

void add_timer(long jiffies, void (*fn)(void))
{
	struct timer_list * p;

	if (!fn)
		return;
	cli();
	if (jiffies <= 0)
		(fn)();
	else {
		for (p = timer_list ; p < timer_list + TIME_REQUESTS ; p++)
			if (!p->fn)
				break;
		if (p >= timer_list + TIME_REQUESTS)
			panic("No more time requests free");
		p->fn = fn;
		p->jiffies = jiffies;
		p->next = next_timer;
		next_timer = p;
		while (p->next && p->next->jiffies < p->jiffies) {
			p->jiffies -= p->next->jiffies;
			fn = p->fn;
			p->fn = p->next->fn;
			p->next->fn = fn;
			jiffies = p->jiffies;
			p->jiffies = p->next->jiffies;
			p->next->jiffies = jiffies;
			p = p->next;
		}
	}
	sti();
}

void do_timer(long cpl)
{
	extern int beepcount;
	extern void sysbeepstop(void);

	if (beepcount)
		if (!--beepcount)
			sysbeepstop();

	if (cpl)
		current->utime++;
	else
		current->stime++;

	if (next_timer) {
		next_timer->jiffies--;
		while (next_timer && next_timer->jiffies <= 0) {
			void (*fn)(void);
			
			fn = next_timer->fn;
			next_timer->fn = NULL;
			next_timer = next_timer->next;
			(fn)();
		}
	}
	if (current_DOR & 0xf0)
		do_floppy_timer();
	if ((--current->counter)>0) return;
	current->counter=0;
	if (!cpl) return;
	schedule();
}

int sys_alarm(long seconds)
{
	int old = current->alarm;

	if (old)
		old = (old - jiffies) / HZ;
	current->alarm = (seconds>0)?(jiffies+HZ*seconds):0;
	return (old);
}

int sys_getpid(void)
{
	return current->pid;
}

int sys_getppid(void)
{
	return current->father;
}

int sys_getuid(void)
{
	return current->uid;
}

int sys_geteuid(void)
{
	return current->euid;
}

int sys_getgid(void)
{
	return current->gid;
}

int sys_getegid(void)
{
	return current->egid;
}

int sys_nice(long increment)
{
	if (current->priority-increment>0)
		current->priority -= increment;
	return 0;
}

void sched_init(void)
{
	int i;
	struct desc_struct * p;

	if (sizeof(struct sigaction) != 16)
		panic("Struct sigaction MUST be 16 bytes");
	set_tss_desc(gdt+FIRST_TSS_ENTRY,&(init_task.task.tss));
	set_ldt_desc(gdt+FIRST_LDT_ENTRY,&(init_task.task.ldt));
	p = gdt+2+FIRST_TSS_ENTRY;
	for(i=1;i<NR_TASKS;i++) {
		task[i] = NULL;
		p->a=p->b=0;
		p++;
		p->a=p->b=0;
		p++;
	}
/* Clear NT, so that we won't have troubles with that later on */
	__asm__("pushfl ; andl $0xffffbfff,(%esp) ; popfl");
	ltr(0);
	lldt(0);
	outb_p(0x36,0x43);		/* binary, mode 3, LSB/MSB, ch 0 */
	outb_p(LATCH & 0xff , 0x40);	/* LSB */
	outb(LATCH >> 8 , 0x40);	/* MSB */
	set_intr_gate(0x20,&timer_interrupt);
	outb(inb_p(0x21)&~0x01,0x21);
	set_system_gate(0x80,&system_call);
}