进程程序替换[1~5]
- 1.程序替换的接口（加载器）
- 2.什么是程序替换？
- 3.进程替换的原理
- 4.引入多进程
- 5.系列程序替换接口的详细解析（重点！）
自定义编写一个极简版shell[6~8]
- 6.完成命令行提示符
- 7.获取输入的命令行字符串
- 8.完整的代码与测试效果

进程程序替换[1~5]

创建子进程的目的是：
1.让子进程执行父进程的一部分代码；
2.让子进程执行一个全新的程序代码——进程的程序替换。

1.程序替换的接口（加载器）

程序替换的接口有：
在这里插入图片描述

（environ是环境变量表的指针，在环境变量讲过）

头文件：
#include <unistd.h>
函数：
int execl(const char *path, const char *arg, …);
int execlp(const char *file, const char *arg, …);
int execle(const char *path, const char *arg, …,char *const envp[]);
int execv(const char *path, char *const argv[]);
int execvp(const char *file, char *const argv[]);

int execve(const char *path, char *const argv[], char *const envp[]);

返回值：
调用成功，没有返回值；
调用出错，返回-1；（比如替换不存在的程序就会失败）
所以execl系列函数只有出错的返回值而没有成功的返回值；
即：只要有返回值，就失败了。

关于execl系列函数的使用：
因为这些程序替换函数如果替换成功，后面我们自己的程序就会被替换不再执行，会执行替换的程序。
所以我们一般不用对这些函数的返回值做判断，在使用完这些函数后，直接使用exit(1)退出就可以了，因为替换成功了不会执行退出函数，替换失败了就会执行exit(1)异常退出！

补充：
参数中的“…”表示的是可变参数列表，可变参数列表的作用是可以给函数传递任意个数个参数。
比如这个函数，前面两个指明的参数时必须传的，但是后面的参数可以不传，也可以任意个数去传。（其实与平时调用printf一样，逗号后面的参数需要多少个就传多少个）

2.什么是程序替换？

——用一段简单的代码来解释：

//测试代码
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main()
{
    printf("begin...\n");
    printf("我是一个进程了,我的PID:%d\n", getpid());
    
    execl("/bin/ls", "ls", "-a", "-l", NULL);

    printf("end...\n");
}

结果：
在这里插入图片描述
可以观察到，上面执行的是自己写的程序，后面执行的是ls命令。
ls是一个磁盘中的可执行程序，也就是一个文件，当我们自己的进程运行时，在内存中有自己的pcb和代码数据，刚开始执行的时候，执行的使我们自己的代码和数据，当执行execl时，就将磁盘中的ls可执行程序替换到当前进程的代码和数据中（main函数中老的代码和数据全都被替换了），所以后面执行的就不是我们自己的代码和数据了，执行的是替换后的ls的代码和数据，所以后面我们自己程序中的"end…"也没有打印出来。

3.进程替换的原理

在程序替换的时候，进程的数据和代码直接被新的程序所替换，如下图所示，并且直接换物理内存，左边的映射不变。
并且在进程替换的时候，并没有创建新的子进程，只是将当前的进程进行了替换，让CPU去调度当前进程就可以运行了，进程的内核pcb和虚拟地址空间都没有发生变化。
在这里插入图片描述

4.引入多进程

1.程序替换是整体替换，不能局部替换。
意思就是当前进程调用了程序替换接口，则当前进程全部的代码和数据都会被替换成新的。

2.程序替换只会影响调用的进程，因为进程具有独立性。
虽然父子进程通过页表指向同样的代码和数据，但是当子进程发生进程替换的时候，会发生写时拷贝，将父子进程进行区分，就不会影响父进程了。（代码区和数据区全都发生写时拷贝）

例如下面的代码：只替换子进程的程序：

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include<sys/types.h>
#include <sys/wait.h>

int main()
{
    pid_t id = fork();
    if(id == 0)
    {
        printf("我是子进程:%d\n", getpid());
        execl("/bin/ls", "ls", "-a", "-l", NULL);
        //1.这里注释的是下面会替换失败的情况，这种情况父进程获取退出码为1
        //execl("/bin/lssssss", "ls", "-a", "-l", NULL);
        //2.这里注释的是下面会替换成功但是替换的程序内部出错的情况，这种情况父进程获取退出码为2
        //execl("/bin/ls", "ls", "-abcdefg", "-l", NULL);
        exit(1);
    }
    sleep(5);
    int status = 0;
    printf("我是父进程:%d\n", getpid());
    waitpid(id, &status, 0);
    printf("child exit code:%d\n", WEXITSTATUS(status));
    return 0;
}

运行结果：
成功父子进程正常运行，子进程执行替换程序；
失败父子进程正常运行，子进程执行原有代码；
在这里插入图片描述

解析：
如果程序替换成功，则父进程waitpid获取到替换进程的退出码“0”；
如果程序替换失败，则父进程waitpid获取到原来进程的退出码“1”；
如果程序替换成功，但是替换成功的程序选项错误，这时的错误退出码也是由替换后的程序返回的，就比如前面代码中的成功但是退出码是2的情况。

5.系列程序替换接口的详细解析（重点！）

这些所有接口都符合前面所讲的性质，只是用法略有差异。

头文件：
#include <unistd.h>
函数：
int execl(const char *path, const char *arg, …);
int execlp(const char *file, const char *arg, …);
int execle(const char *path, const char *arg, …,char *const envp[]);
int execv(const char *path, char *const argv[]);
int execvp(const char *file, char *const argv[]);

int execve(const char *path, char *const argv[], char *const envp[]);

int execl(const char *path, const char *arg, …) ——“l”表示list，以列表方式一个个传参。
path——想要执行程序的路径（比如ls，就传"/bin/ls"）；
arg——执行该程序的方式，（在命令行怎么执行就怎么传参，比如ls -a -l就传"ls", "-a", "-l"）参数最后必须以NULL结尾；
例：execl("/bin/ls", "ls", "-a", "-l", NULL);

int execv(const char *path, char *const argv[]) ——“v”表示vector，以数组方式一次性传参。
path——想要执行程序的路径（比如ls，就传"/bin/ls"）；
argv[]——传参以指针数组方式传（在命令行怎么执行数组中就怎么写，比如可以定义一个指针数组：char *myargv[] = {"ls", "-a", "-l", NULL};），传入的数组最后以NULL结尾；
例：

char *myargv[] = {"ls", "-l", "-a", NULL};
execv("/bin/ls", myargv);

int execlp(const char *file, const char *arg, …) ——“p”表示环境变量PATH，会自动在PATH中查找。
file——想要执行的程序，不用带路径（比如ls，就传"ls"）；
arg——与execl的arg传入参数方式完全一样。（比如ls -a -l就传"ls", "-a", "-l"）；
例：execlp("ls", "ls", "-a", "-l", NULL);

4.int execvp(const char *file, char *const argv[]) ——“v”与“p”的意思和上面的一样。
file——想要执行的程序，不用带路径（比如ls，就传"ls"）；
argv[]——传参以指针数组方式传。（与execv一样）；
例：execvp("ls", myargv);

int execle(const char *path, const char *arg, …,char *const envp[]) ——父进程给子进程手动传环境变量。
path——想要执行程序的路径。
arg——执行该程序的方式，一个一个传入。
envp[]——自定义环境变量，以NULL结尾，比如：char *const myenv[] = {"MYENV=YouCanSeeMe", NULL};
解析：可以通过这些接口调用自己写好的其他可执行程序。（这些程序可以不是C程序，其他语言都可以调用）

例如：目录如下如所示，我们要通过myproc调用otherproc，并在otherproc中获取myproc传给它的环境变量MYENV。
在这里插入图片描述

//myproc.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>

int main()
{
    pid_t id = fork();
    if(id == 0)
    {
        printf("我是子进程:%d\n", getpid());
        
        char *const myenv[] = {"MYENV=YouCanSeeMe", NULL};//自定义环境变量
        execle("./Otherproc/otherproc", "otherproc", NULL, myenv);//传入自定义环境变量

        exit(1);
    }
    sleep(5);
    int status = 0;
    printf("我是父进程:%d\n", getpid());
    waitpid(id, &status, 0);
    printf("child exit code:%d\n", WEXITSTATUS(status));
    return 0;
}

//otherproc.cc
#include <iostream>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
using namespace std;

int main()
{
    for(int i = 0; i < 5; i++)
    {
        cout << "这是子进程, PID : " << getpid() << " MYENV : " << (getenv("MYENV")==NULL? "NULL" : getenv("MYENV")) << " PATH : " << (getenv("PATH")==NULL? "NULL" : getenv("PATH")) << endl;
        sleep(1);
    }
    return 0;
}

运行结果：可以看到子进程获取了父进程传入的自定义环境变量。
在这里插入图片描述

注意：自定义环境变量envp是覆盖式传入，之前的老的会被覆盖！

如果我们想把父进程的环境变量原封不动的传给子进程，那么就用environ（之前讲过的获取环境变量的方法之一）传入即可。
用法：在父进程创建extern char **environ;后，直接传入。
比如：

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>

int main()
{
    extern char **environ;//获取父进程环境变量，下面通过execle传入
    pid_t id = fork();
    if(id == 0)
    {
        printf("我是子进程:%d\n", getpid());
        
        execle("./Otherproc/otherproc", "otherproc", NULL, environ);

        exit(1);
    }
    sleep(5);
    int status = 0;
    printf("我是父进程:%d\n", getpid());
    waitpid(id, &status, 0);
    printf("child exit code:%d\n", WEXITSTATUS(status));
    return 0;
}

运行结果：发现子进程获取了父进程的默认环境变量PATH。
在这里插入图片描述

如果我们想保留原来父进程的环境变量，并且在此基础上增加自定义环境变量呢？
——int putenv(char *string);那个进程调用这个函数，就在当前的进程中新增一个环境变量。
头文件：#include <stdlib.h>。

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>

int main()
{
    extern char **environ;
    pid_t id = fork();
    if(id == 0)
    {
        printf("我是子进程:%d\n", getpid());

        putenv("MYENV=YouCanSeeMe");
        execle("./Otherproc/otherproc", "otherproc", NULL, environ);

        exit(1);
    }
    sleep(5);
    int status = 0;
    printf("我是父进程:%d\n", getpid());
    waitpid(id, &status, 0);
    printf("child exit code:%d\n", WEXITSTATUS(status));
    return 0;
}

我们的子进程otherproc默认继承的是父进程proc的环境变量，那么父进程的环境变量从哪里来？
——bash，所以我们也可以不用putenv来追加环境变量，直接在命令行输入export MYENV=YouCanSeeMe，来给bash添加一个环境变量，这样默认传入environ的时候子进程也会接收到MYENV这个环境变量了。

int execve(const char *path, char *const argv[], char *const envp[]);
发现这个接口在man手册中被单独放出来了，和前面的有什么区别？
——接口用法的规则和前面的几个一模一样，类推即可不再详细解析。
那么区别是：这个是真正提供的的系统调用接口，上面的所有都是对这个的封装！

自定义编写一个极简版shell[6~8]

bash就是一个进程：
在这里插入图片描述

6.完成命令行提示符

bash会接收我们输入的命令行字符串，并且不会退出，一直在为我们打印命令行提示符。

一个命令行提示符包括：
[用户名 + @ + 主机名 + 当前路径的名称]$

每个字段都可以通过对应的系统调用获取，但是现在对我来说意义不大，这里直接打印，就不使用了。

//实现命令行提示符：
int main()
{
    while(1)
    {
        printf("[YGH@MyMachina CurrentPath]#");//命令行的输入是在当前行输入，不能用\n
        fflush(stdout);//不用\n不会行刷新出来，所以要用fflush刷新缓冲区
        sleep(100);//测试观察，这句之后不要
    }
}

7.获取输入的命令行字符串

使用接口fget——从特定的标准输入stream当中获取命令行输入。

接口函数：char *fgets(char *s, int size, FILE *stream);
头文件：stdio.h
手册查询：

1.首先我们自己定一个一个命令行commandstr，然后通过fgets获取输入的命令行字符串：

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <assert.h>

#define MAX 1024

int main()
{
    char commandstr[MAX] = {0};
    while(1)
    {
        printf("[YGH@MyMachina CurrentPath]# ");//命令行的输入是在当前行输入，不能用\n
        fflush(stdout);//不用\n不会行刷新出来，所以要用fflush刷新缓冲区
        char *s = fgets(commandstr, sizeof(commandstr), stdin);//输入的时候最后缓冲区会输入一个\n
        
        assert(s);//断言在debug方式的时候存在，release方式会被裁掉
        (void)s;//这句的用处是让断言在release方式下也不会失效（因为assert被去掉了，s没有被使用会报警，这里用一下就不会报警了）

        commandstr[strlen(commandstr) - 1] = '\0';//去掉我们缓冲区中输入的\n
        printf("%s\n", commandstr);//测试观察一下，这句之后不要
    }
}

2.然后我们也不能自己来执行命令行输入要执行的程序，因为如果进程替换了我们作为命令行自己就会被替换掉了，这样是不合理的，所以肯定需要子进程来处理。
即父进程把命令给子进程，然后父进程等待结果就行了。
同时我们的自定义缓冲区中输入的是一整个连续的字符串，需要切割成单个的命令传入，比如：“ls -a -l” 要切割成： “ls”, “-a”, “-l”。
——可以自己切割，也可以使用字符串分割函数：char* strtok(char* str, const char* sep);
进行字符串的切割：

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <assert.h>
#include <sys/types.h>
#include <sys/wait.h>

#define MAX 1024
#define ARGC 64
#define SEP " "//分隔符空格

//字符串切割函数
int split(char* commandstr, char* argv[])
{
    assert(commandstr);
    assert(argv);

    argv[0] = strtok(commandstr, SEP);//函数切割后返回第一个地址
    if(argv[0] == NULL)
    {
        return 1;
    }

    int i = 1;
    while(argv[i++] = strtok(NULL, SEP));//这一句等价于下面的一段代码
    //while(1)
    //{
    //        argv[i] = strtok(NULL, SEP);//NULL表示还是继续切割commandstr
    //        if(argv[i] == NULL)
    //        {
    //            break;
    //        }
    //        i++;
    //}
    return 0;
}

int main()
{
    while(1)
    {
        char commandstr[MAX] = {0};//放在循环内部每次输入后重新创建缓冲区判断
        char* argv[ARGC] = {NULL};//存放切割后的字符串

        printf("[YGH@MyMachina CurrentPath]# ");//命令行的输入是在当前行输入，不能用\n
        fflush(stdout);//不用\n不会行刷新出来，所以要用fflush刷新缓冲区
        char *s = fgets(commandstr, sizeof(commandstr), stdin);//输入的时候最后缓冲区会输入一个\n
        
        assert(s);//断言在debug方式的时候存在，release方式会被裁掉
        (void)s;//这句的用处是让断言在release方式下也不会失效（因为assert被去掉了，s没有被使用会报警，这里用一下就不会报警了）

        commandstr[strlen(commandstr) - 1] = '\0';//去掉我们缓冲区中输入的\n
        
        int n = split(commandstr, argv);//字符串切割
        if(n != 0)//判断字符串是否切割成功
        {
            continue;
        }
        
        pid_t id = fork();
        assert(id >= 0);
        (void)id;

        if(id == 0)
        {
            //子进程
            exit(0);
        }
        //父进程
        int status = 0;//退出码
        waitpid(id, &status, 0);//阻塞等待子进程
    }
}

完成切割后进行进程的替换：要使用execvp，因为输入的命令没有地址，要在环境变量中直接寻找，需要“p”，而且输入的命令行字符串时切割开放在一个数组中的，所以需要“v”。

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <assert.h>
#include <sys/types.h>
#include <sys/wait.h>

#define MAX 1024
#define ARGC 64
#define SEP " "//分隔符空格

//字符串切割函数
int split(char* commandstr, char* argv[])
{
    assert(commandstr);
    assert(argv);

    argv[0] = strtok(commandstr, SEP);//函数切割后返回第一个地址
    if(argv[0] == NULL)
    {
        return 1;
    }

    int i = 1;
    while(argv[i++] = strtok(NULL, SEP));//这一句等价于下面的一段代码
    //while(1)
    //{
    //        argv[i] = strtok(NULL, SEP);//NULL表示还是继续切割commandstr
    //        if(argv[i] == NULL)
    //        {
    //            break;
    //        }
    //        i++;
    //}
    return 0;
}

//用来输出切割后的字符串的函数
void debugPrintf(char* argv[])
{
    for(int i = 0; argv[i]; i++)
    {
        printf("%d: %s\n", i, argv[i]);
    }
}

int main()
{
    while(1)
    {
        char commandstr[MAX] = {0};//放在循环内部每次输入后重新创建缓冲区判断
        char* argv[ARGC] = {NULL};//存放切割后的字符串

        printf("[YGH@MyMachina CurrentPath]# ");//命令行的输入是在当前行输入，不能用\n
        fflush(stdout);//不用\n不会行刷新出来，所以要用fflush刷新缓冲区
        char *s = fgets(commandstr, sizeof(commandstr), stdin);//输入的时候最后缓冲区会输入一个\n
        
        assert(s);//断言在debug方式的时候存在，release方式会被裁掉
        (void)s;//这句的用处是让断言在release方式下也不会失效（因为assert被去掉了，s没有被使用会报警，这里用一下就不会报警了）

        commandstr[strlen(commandstr) - 1] = '\0';//去掉我们缓冲区中输入的\n
        
        int n = split(commandstr, argv);//字符串切割
        if(n != 0)//判断字符串是否切割成功
        {
            continue;
        }

        //debugPrintf(argv);//测试输出切割后的字符串
        pid_t id = fork();
        assert(id >= 0);
        (void)id;

        if(id == 0)
        {
            //子进程
            execvp(argv[0], argv);
            exit(1);//替换失败直接退出
        }
        //父进程
        int status = 0;//退出码
        waitpid(id, &status, 0);//阻塞等待子进程
    }
}

8.完整的代码与测试效果

以上就实现了一个最基础的极简版的shell命令行，以下是删除不必要的完整代码：

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <assert.h>
#include <sys/types.h>
#include <sys/wait.h>

#define MAX 1024
#define ARGC 64
#define SEP " "//分隔符空格

//字符串切割函数
int split(char* commandstr, char* argv[])
{
    assert(commandstr);
    assert(argv);

    argv[0] = strtok(commandstr, SEP);//函数切割后返回第一个地址
    if(argv[0] == NULL)
    {
        return 1;
    }

    int i = 1;
    while(argv[i++] = strtok(NULL, SEP));//这一句等价于下面的一段代码
    return 0;
}

int main()
{
    while(1)
    {
        char commandstr[MAX] = {0};//放在循环内部每次输入后重新创建缓冲区判断
        char* argv[ARGC] = {NULL};//存放切割后的字符串

        printf("[YGH@MyMachina CurrentPath]# ");//命令行的输入是在当前行输入，不能用\n
        fflush(stdout);//不用\n不会行刷新出来，所以要用fflush刷新缓冲区
        char *s = fgets(commandstr, sizeof(commandstr), stdin);//输入的时候最后缓冲区会输入一个\n
        
        assert(s);//断言在debug方式的时候存在，release方式会被裁掉
        (void)s;//这句的用处是让断言在release方式下也不会失效（因为assert被去掉了，s没有被使用会报警，这里用一下就不会报警了）

        commandstr[strlen(commandstr) - 1] = '\0';//去掉我们缓冲区中输入的\n
        
        int n = split(commandstr, argv);//字符串切割
        if(n != 0)//判断字符串是否切割成功
        {
            continue;
        }

        pid_t id = fork();
        assert(id >= 0);
        (void)id;

        if(id == 0)
        {
            //子进程
            execvp(argv[0], argv);
            exit(1);//替换失败直接退出
        }
        //父进程
        int status = 0;//退出码
        waitpid(id, &status, 0);//阻塞等待子进程
    }
}