Linux 07：基础IO

stdin & stdout & stderr

C默认会打开三个输入输出流，分别是stdin, stdout, stderr。
仔细观察发现，这三个流的类型都是FILE*, fopen返回值类型，文件指针。

文件读取函数（库函数）：

fopen、fread、fwrite、fseek、ftell、rewind、fclose

打开文件的方式

r	Open text file for reading. The stream is positioned at the beginning of the file.
r+	Open for reading and writing. The stream is positioned at the beginning of the file.
w	runcate(缩短) file to zero length or create text file for writing. The stream is positioned at the beginning of the file.
w+	Open for reading and writing. The file is created if it does not exist, otherwise it is truncated. The stream is positioned at the beginning of the file.
a	Open for appending (writing at end of file). The file is created if it does not exist. The stream is positioned at the end of the file.
a+	Open for reading and appending (writing at end of file). The file is created if it does not exist. The initial file position for reading is at the beginning of the file, but output is always appended to the end of the file.

系统文件IO

操作文件，除了上述C接口（当然，C++也有接口，其他语言也有），我们还可以采用系统接口来进行文件访问。

open、read、write、lseek、close

open

#include <sys/types.h>

#include <sys/stat.h> #

include <fcntl.h>

int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);
pathname: 要打开或创建的目标文件
flags: 打开文件时，可以传入多个参数选项，用下面的一个或者多个常量进行“或”运算，构成flags。
参数:
        O_RDONLY: 只读打开
        O_WRONLY: 只写打开
        O_RDWR : 读，写打开
                这三个常量，必须指定一个且只能指定一个

        O_CREAT : 若文件不存在，则创建它。需要使用mode选项，来指明新文件的访问权限

        O_APPEND: 追加写

返回值：
        成功：新打开的文件描述符
        失败：-1

mode：

创建新文件时，可以指定文件的权限

open 函数具体使用哪个，和具体应用场景相关，如目标文件不存在，需要open创建，则第三个参数表示创建文件的默认权限,否则，使用两个参数的open。

open函数返回值

在认识返回值之前，先来认识一下两个概念：系统调用和库函数

上面的 fopen fclose fread fwrite 都是C标准库当中的函数，我们称之为库函数（libc）。
open close read write lseek 都属于系统提供的接口，称之为系统调用接口
操作系统概念中有一张图

系统调用接口和库函数的关系，一目了然。
所以，可以认为，f#系列的函数，都是对系统调用的封装，方便二次开发。

文件描述符fd

通过对open函数的学习，我们知道了文件描述符就是一个小整数。

0 & 1 & 2

Linux进程默认情况下会有3个缺省打开的文件描述符，分别是标准输入0，标准输出1，标准错误2。
0,1,2对应的物理设备一般是：键盘，显示器，显示器。
所以输入输出还可以采用如下方式：

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
int main()
{
    char buf[1024];
    ssize_t s = read(0, buf, sizeof(buf));
    if(s > 0){
        buf[s] = 0;
        write(1, buf, strlen(buf));
        write(2, buf, strlen(buf));
    }
    return 0;
}

我们要知道，操作系统通过文件PCB来管理进程，PCB中就有着指向文件描述符表的指针，看下图：

而现在知道，文件描述符就是从0开始的小整数。当我们打开文件时，操作系统在内存中要创建相应的数据结构来描述目标文件。于是就有了file结构体。表示一个已经打开的文件对象。而进程执行open系统调用，所以必须让进程和文件关联起来。每个进程都有一个指针*files, 指向一张表files_struct,该表最重要的部分就是包涵一个指针数组，每个元素都是一个指向打开文件的指针！所以，本质上，文件描述符就是该数组的下标。所以，只要拿着文件描述符，就可以找到对应的文件。

文件描述符的分配规则

第一份代码：

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
    int fd = open("myfile", O_RDONLY);
    if(fd < 0){
        perror("open");
        return 1;
    }
    printf("fd: %d\n", fd);
    close(fd);
    return 0;
}

第二份代码：

关闭0或者2，再看。

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
    close(0);
    //close(2);
    int fd = open("myfile", O_RDONLY);
    if(fd < 0){
        perror("open");
        return 1;
    }
    printf("fd: %d\n", fd);
    close(fd);
    return 0;
}

发现是结果是：fd：0 或者 fd：2可见，文件描述符的分配规则：在files_struct数组当中，找到当前没有被使用的最小的一个下标，作为新的文件描述符。

第三份代码：

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
int main()
{
    close(1);
    int fd = open("myfile", O_WRONLY|O_CREAT, 00644);
    if(fd < 0){
        perror("open");
        return 1;
    }
    printf("fd: %d\n", fd);
    fflush(stdout);
    close(fd);
    exit(0);
}

此时，我们发现，本来应该输出到显示器上的内容，输出到了文件 myfile 当中，其中，fd＝1。这种现象叫做输出重定向。常见的重定向有:>, >>, <。

那重定向的本质是什么呢？看下图：

dup2系统调用

#include <unistd.h>

int dup2(int oldfd, int newfd);

示例代码：

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
int main() {
    int fd = open("./log", O_CREAT | O_RDWR);
    if (fd < 0) {
        perror("open");
        return 1;
    }
    close(1);
    dup2(fd, 1);
    for (;;) {
        char buf[1024] = {0};
        ssize_t read_size = read(0, buf, sizeof(buf) - 1);
        if (read_size < 0) {
            perror("read");
            break;
        }
        printf("%s", buf);
        fflush(stdout);
    }
    return 0;
}

printf是C库当中的IO函数，一般往stdout中输出，但是stdout底层访问文件的时候，找的还是fd：1, 但此时，fd：1下标所表示内容，已经变成了myfile的地址，不再是显示器文件的地址，所以，输出的任何消息都会往文件中写入，进而完成输出重定向。那追加和输入重定向如何完成呢？可以按照上述思路。

FILE

因为IO相关函数与系统调用接口对应，并且库函数封装系统调用，所以本质上，访问文件都是通过fd访问的。
所以C库当中的FILE结构体内部，必定封装了fd。

#include <stdio.h>
#include <string.h>
int main()
{
    const char *msg0="hello printf\n";
    const char *msg1="hello fwrite\n";
    const char *msg2="hello write\n";
    printf("%s", msg0);
    fwrite(msg1, strlen(msg0), 1, stdout);
    write(1, msg2, strlen(msg2));
    fork();
    return 0;
}

但如果对进程实现输出重定向呢？ ./hello > file ，我们发现结果变成了：

我们发现 printf 和 fwrite （库函数）都输出了2次，而 write 只输出了一次（系统调用）。为什么呢？肯定和fork有关！

一般C库函数写入文件时是全缓冲的，而写入显示器是行缓冲。
printf、fwrite 库函数会自带缓冲区，当发生重定向到普通文件时，数据的缓冲方式由行缓冲变成了全缓冲。
而我们放在缓冲区中的数据，就不会被立即刷新，甚至fork之后。
但是进程退出之后，会统一刷新，写入文件当中。
但是fork的时候，父子数据会发生写时拷贝，所以当你父进程准备刷新的时候，子进程也就有了同样的一份数据，随即产生两份数据。
write 没有变化，说明没有所谓的缓冲。

所以我们可以得出结论：

printf、fwrite库函数会自带缓冲区，而write系统调用没有带缓冲区。另外，我们这里所说的缓冲区，都是用户级缓冲区。其实为了提升整机性能，OS也会提供相关内核级缓冲区，不过不再我们讨论范围之内。
那这个缓冲区谁提供呢？ printf、fwrite是库函数， write是系统调用，库函数在系统调用的“上层”，是对系统调用的“封装”，但是 write没有缓冲区，而 printf、fwrite 有，足以说明，该缓冲区是二次加上的，又因为是C，所以由C标准库提供。