Linux系统编程 day04 文件和目录操作

1. 文件IO
- 1.1 open 函数
- 1.2 close函数
- 1.3 read函数
- 1.4 write函数
- 1.5 lseek函数
- 1.6 errno变量
- 1.7 文件示例1 读写文件
- 1.8 文件示例2 文件大小的计算
- 1.9 文件示例3 扩展文件大小
- 1.10 文件示例4 perror函数的使用
- 1.11 阻塞与非阻塞的测试
2. 文件和目录
- 2.1 文件操作相关函数
- 2.2 目录操作相关函数
- 2.3 dup/dup2/fcntl函数

1. 文件IO

在C语言阶段学习了关于文件操作的一系列C标准函数，如fopen、fclose、fread、fwrite、fscanf、fprintf等，这一系列函数无不是以f开头。而这一节中关于文件IO操作的函数则是Linux的系统函数。在Linux中，fopen函数会调用Linux系统调用中的open函数，fclose函数会调用Linux系统调用中的close函数。

C标准函数和系统调用函数是不同的，系统调用是由操作系统实现并给外部应用程序提供的编程接口，也就是含有Linux系统的系统调用函数的程序离开了Linux就会不能再编译运行。也就是移植性变差了，不能实现跨平台。而只使用C标准函数的程序是可以跨平台的，不受操作系统的限制。

在我们之前调用fopen的时候会返回一个FILE *类型的指针，实际上这个指针维护着三个很重要的区域，分别是文件描述符、文件指针、文件缓冲区。每一个FILE文件流的缓冲区默认大小是8192字节。Linux系统的IO函数默认是没有缓冲区的。关于文件描述符在上一节也就提过，本质是一个int类型的整数。

在一个进程启动的时候，会默认打开三个文件描述符，分别如下：

#define STDIN_FILENO 0
#define STDOUT_FILENO 1
#define STDOUT_FILENO 2

而新打开的文件返回的是文件描述符表中未使用的最小文件描述符，一个文件描述符表最多可以存1024个文件描述符。调用open函数就可以打开或者创建文件，得到一个文件描述符。

1.1 open 函数

下面是一些关键描述：

SYNOPSIS
       #include <sys/types.h>
       #include <sys/stat.h>
       #include <fcntl.h>

       int open(const char *pathname, int flags);
       int open(const char *pathname, int flags, mode_t mode);

DESCRIPTION
       The open() system call opens the file specified by pathname.  If the specified file does not exist, it may optionally (if
       O_CREAT is specified in flags) be created by open().

       The return value of open() is a file descriptor, a small, nonnegative integer that is used  in  subsequent  system  calls
       (read(2),  write(2),  lseek(2),  fcntl(2), etc.) to refer to the open file.  The file descriptor returned by a successful
       call will be the lowest-numbered file descriptor not currently open for the process.

       A call to open() creates a new open file description, an entry in the system-wide table of open files.  The open file de‐
       scription  records  the  file  offset and the file status flags (see below).  A file descriptor is a reference to an open
       file description; this reference is unaffected if pathname is subsequently removed or modified to refer  to  a  different
       file.  For further details on open file descriptions, see NOTES.

       The  argument flags must include one of the following access modes: O_RDONLY, O_WRONLY, or O_RDWR.  These request opening
       the file read-only, write-only, or read/write, respectively.

       In addition, zero or more file creation flags and file status flags can be bitwise-or'd  in  flags.   The  file  creation
       flags  are  O_CLOEXEC, O_CREAT, O_DIRECTORY, O_EXCL, O_NOCTTY, O_NOFOLLOW, O_TMPFILE, and O_TRUNC.  The file status flags
       are all of the remaining flags listed below.  The distinction between these two groups of flags is that the file creation
       flags  affect  the semantics of the open operation itself, while the file status flags affect the semantics of subsequent
       I/O operations.  The file status flags can be retrieved and (in some cases) modified; see fcntl(2) for details.

       The full list of file creation flags and file status flags is as follows:

       O_APPEND
              The file is opened in append mode.  Before each write(2), the file offset is positioned at the end of the file, as
              if  with  lseek(2).   The modification of the file offset and the write operation are performed as a single atomic
              step.

              O_APPEND may lead to corrupted files on NFS filesystems if more than one process appends data to a file  at  once.
              This  is because NFS does not support appending to a file, so the client kernel has to simulate it, which can't be
              done without a race condition.

       O_CREAT
              If pathname does not exist, create it as a regular file.

              The owner (user ID) of the new file is set to the effective user ID of the process.

              The group ownership (group ID) of the new file is set either to the effective group ID of the  process  (System  V
              semantics)  or to the group ID of the parent directory (BSD semantics).  On Linux, the behavior depends on whether
              the set-group-ID mode bit is set on the parent directory: if that bit is set, then BSD semantics apply; otherwise,
              System  V  semantics apply.  For some filesystems, the behavior also depends on the bsdgroups and sysvgroups mount
              options described in mount(8)).

              The mode argument specifies the file mode bits be applied when a new file is created.  This argument must be  sup‐
              plied when O_CREAT or O_TMPFILE is specified in flags; if neither O_CREAT nor O_TMPFILE is specified, then mode is
              ignored.  The effective mode is modified by the process's umask in the usual way: in the absence of a default ACL,
              the mode of the created file is (mode & ~umask).  Note that this mode applies only to future accesses of the newly
              created file; the open() call that creates a read-only file may well return a read/write file descriptor.


              The following symbolic constants are provided for mode:

              S_IRWXU  00700 user (file owner) has read, write, and execute permission

              S_IRUSR  00400 user has read permission

              S_IWUSR  00200 user has write permission

              S_IXUSR  00100 user has execute permission

              S_IRWXG  00070 group has read, write, and execute permission

              S_IRGRP  00040 group has read permission

              S_IWGRP  00020 group has write permission

              S_IXGRP  00010 group has execute permission

              S_IRWXO  00007 others have read, write, and execute permission

              S_IROTH  00004 others have read permission

              S_IWOTH  00002 others have write permission

              S_IXOTH  00001 others have execute permission

              According to POSIX, the effect when other bits are set in mode is unspecified.  On Linux, the following  bits  are
              also honored in mode:

              S_ISUID  0004000 set-user-ID bit

              S_ISGID  0002000 set-group-ID bit (see inode(7)).

              S_ISVTX  0001000 sticky bit (see inode(7)).

       O_TRUNC
              If the file already exists and is a regular file and the access mode allows writing (i.e., is O_RDWR or  O_WRONLY)
              it  will  be  truncated  to length 0.  If the file is a FIFO or terminal device file, the O_TRUNC flag is ignored.
              Otherwise, the effect of O_TRUNC is unspecified.

RETURN VALUE
       open(), openat(), and creat() return the new file descriptor, or -1 if an error occurred (in which case, errno is set ap‐
       propriately).

上面的内容大概介绍了open函数的使用，通过上面的描述可以知道要是用open函数需要包含三个头文件，分别是sys/types.h、sys/stat.h和fcntl.h。open函数有两种调用形式，分别是

int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);

该函数的作用是打开一个文件，并返回其文件描述符。其中前两个参数都是一样的，第一个参数pathname表示文件的路径名字，第二个参数flags是一些标志，部分重要的标志如下：

标志	作用
O_RDWR	可读可写
O_RDONLY	只读
O_WRONLY	只写
O_APPEND	追加
O_CREAT	创建，这个flag需要指定最后一个参数`mode`
O_TRUNC	文件存在截断文件内容为长度0

当指定了O_CREAT需要指定第三个参数mode，其中mode为用户的权限，权限如下：

mode	权限
S_IRWXU	属主可读可写可执行
S_IRUSR	属主可读
S_IWUSR	属主可写
S_IXUSR	属主可执行
S_IRWXG	属组可读可写可执行
S_IRGRP	属组可读
S_IWGRP	属组可写
S_IXGRP	属组可执行
S_IRWXO	其它用户可读可写可执行
S_IROTH	其它用户可读
S_IWOTH	其它用户可写
S_IXOTH	其它用户可执行

上面的flag和mode如果想要使用多个都可以用位运算符|连接起来。

最后来看看该函数的返回值。该函数的返回值为一个新的文件描述符；如果发生了错误则返回-1，并会设置相应的errno。

1.2 close函数

SYNOPSIS
       #include <unistd.h>

       int close(int fd);

DESCRIPTION
       close()  closes  a  file descriptor, so that it no longer refers to any file and may be reused.  Any record locks (see fc‐
       ntl(2)) held on the file it was associated with, and owned by the process, are removed (regardless of the file  descriptor
       that was used to obtain the lock).

       If  fd  is the last file descriptor referring to the underlying open file description (see open(2)), the resources associ‐
       ated with the open file description are freed; if the file descriptor was the last reference to a file which has been  re‐
       moved using unlink(2), the file is deleted.

RETURN VALUE
       close() returns zero on success.  On error, -1 is returned, and errno is set appropriately.

该函数的原型为

int close(int fd);

该函数的作用是关闭打开的文件。参数fd为打开的文件描述符，关闭成功返回值为0，失败返回-1，并设置相应的errno。需要注意的是这个函数的open函数的需要包含的头文件并不一样，该函数需要包含头文件unistd.h。

1.3 read函数

SYNOPSIS
       #include <unistd.h>

       ssize_t read(int fd, void *buf, size_t count);

DESCRIPTION
       read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf.

       On  files that support seeking, the read operation commences at the file offset, and the file offset is incremented by the
       number of bytes read.  If the file offset is at or past the end of file, no bytes are read, and read() returns zero.

       If count is zero, read() may detect the errors described below.  In the absence of any errors, or if read() does not check
       for errors, a read() with a count of 0 returns zero and has no other effects.

       According  to  POSIX.1,  if count is greater than SSIZE_MAX, the result is implementation-defined; see NOTES for the upper
       limit on Linux.

RETURN VALUE
       On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced  by  this
       number.   It is not an error if this number is smaller than the number of bytes requested; this may happen for example be‐
       cause fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are  reading
       from a pipe, or from a terminal), or because read() was interrupted by a signal.  See also NOTES.

       On  error, -1 is returned, and errno is set appropriately.  In this case, it is left unspecified whether the file position
       (if any) changes.

read函数也需要头文件unistd.h，其函数原型为：

ssize_t read(int fd, void *buf, size_t count);

该函数的作用是从fd指向的文件中读取count和字节放入buf中。其中参数fd是文件描述符，buf是缓冲区的地址，count是读取的字节数目。该函数的返回值为读取到的字节数，如果是0表示已经到文件尾。如果失败了就返回-1，并设置相应的errno。

1.4 write函数

SYNOPSIS
       #include <unistd.h>

       ssize_t write(int fd, const void *buf, size_t count);

DESCRIPTION
       write() writes up to count bytes from the buffer starting at buf to the file referred to by the file descriptor fd.

       The number of bytes written may be less than count if, for example, there is insufficient space on the underlying phys‐
       ical medium, or the RLIMIT_FSIZE resource limit is encountered (see setrlimit(2)), or the call  was  interrupted  by  a
       signal handler after having written less than count bytes.  (See also pipe(7).)

       For  a  seekable  file (i.e., one to which lseek(2) may be applied, for example, a regular file) writing takes place at
       the file offset, and the file offset is incremented by the number of bytes actually written.  If the file was open(2)ed
       with  O_APPEND,  the file offset is first set to the end of the file before writing.  The adjustment of the file offset
       and the write operation are performed as an atomic step.

       POSIX requires that a read(2) that can be proved to occur after a write() has returned will return the new data.   Note
       that not all filesystems are POSIX conforming.

       According to POSIX.1, if count is greater than SSIZE_MAX, the result is implementation-defined; see NOTES for the upper
       limit on Linux.

RETURN VALUE
       On success, the number of bytes written is returned.  On error, -1 is returned, and errno is set to indicate the  cause
       of the error.

       Note that a successful write() may transfer fewer than count bytes.  Such partial writes can occur for various reasons;
       for example, because there was insufficient space on the disk device to write all of the requested bytes, or because  a
       blocked  write()  to  a socket, pipe, or similar was interrupted by a signal handler after it had transferred some, but
       before it had transferred all of the requested bytes.  In the event of a partial write, the  caller  can  make  another
       write()  call to transfer the remaining bytes.  The subsequent call will either transfer further bytes or may result in
       an error (e.g., if the disk is now full).

       If count is zero and fd refers to a regular file, then write() may return a failure status if one of the  errors  below
       is  detected.   If  no errors are detected, or error detection is not performed, 0 will be returned without causing any
       other effect.  If count is zero and fd refers to a file other than a regular file, the results are not specified.

该函数所需要的头文件和前面的read函数是一样的，该函数原型为：

 ssize_t write(int fd, const void *buf, size_t count);

该函数的作用是将buf中的数据的前count个字节写入到fd指向的文件中。其中fd是文件描述符，buf是需要进行写操作数据的缓冲区，count是需要写入的字节数。该函数的返回值为成功写入的字节数目，失败了返回-1并设置相应的errno。

1.5 lseek函数

SYNOPSIS
       #include <sys/types.h>
       #include <unistd.h>

       off_t lseek(int fd, off_t offset, int whence);

DESCRIPTION
       lseek() repositions the file offset of the open file description associated with the file descriptor fd to the argument
       offset according to the directive whence as follows:

       SEEK_SET
              The file offset is set to offset bytes.

       SEEK_CUR
              The file offset is set to its current location plus offset bytes.

       SEEK_END
              The file offset is set to the size of the file plus offset bytes.

       lseek() allows the file offset to be set beyond the end of the file (but this does not change the size  of  the  file).
       If data is later written at this point, subsequent reads of the data in the gap (a "hole") return null bytes ('\0') un‐
       til data is actually written into the gap.

RETURN VALUE
       Upon successful completion, lseek() returns the resulting offset location as measured in bytes from  the  beginning  of
       the file.  On error, the value (off_t) -1 is returned and errno is set to indicate the error.

该函数需要头文件sys/types.h和头文件unistd.h。该函数的原型为：

 off_t lseek(int fd, off_t offset, int whence);

该函数的作用是改变文件指针的位置，将fd指向的文件的文件指针从whence处移动offset字节。参数fd是文件描述符，offset是偏移量，whence表示移动的起始位置。该函数的返回值为文件指针距离文件开头处的偏移字节数，失败则返回-1，并设置相应的errno。

1.6 errno变量

ERRNO(3)                                           Linux Programmer's Manual                                          ERRNO(3)

NAME
       errno - number of last error

SYNOPSIS
       #include <errno.h>

DESCRIPTION
       The  <errno.h>  header file defines the integer variable errno, which is set by system calls and some library functions
       in the event of an error to indicate what went wrong.

   errno
       The value in errno is significant only when the return value of the call indicated an error (i.e., -1 from most  system
       calls;  -1 or NULL from most library functions); a function that succeeds is allowed to change errno.  The value of er‐
       rno is never set to zero by any system call or library function.

       For some system calls and library functions (e.g., getpriority(2)), -1 is a valid return on success.  In such cases,  a
       successful  return can be distinguished from an error return by setting errno to zero before the call, and then, if the
       call returns a status that indicates that an error may have occurred, checking to see if errno has a nonzero value.

       errno is defined by the ISO C standard to be a modifiable lvalue of type int, and must not be explicitly declared;  er‐
       rno may be a macro.  errno is thread-local; setting it in one thread does not affect its value in any other thread.

   Error numbers and names
       Valid  error numbers are all positive numbers.  The <errno.h> header file defines symbolic names for each of the possi‐
       ble error numbers that may appear in errno.

       All the error names specified by POSIX.1 must have distinct values, with the exception of EAGAIN and EWOULDBLOCK, which
       may be the same.  On Linux, these two have the same value on all architectures.

       The  error  numbers that correspond to each symbolic name vary across UNIX systems, and even across different architec‐
       tures on Linux.  Therefore, numeric values are not included as part of the list of error names  below.   The  perror(3)
       and strerror(3) functions can be used to convert these names to corresponding textual error messages.

       On  any  particular Linux system, one can obtain a list of all symbolic error names and the corresponding error numbers
       using the errno(1) command (part of the moreutils package):

           $ errno -l
           EPERM 1 Operation not permitted
           ENOENT 2 No such file or directory
           ESRCH 3 No such process
           EINTR 4 Interrupted system call
           EIO 5 Input/output error
           ...

       The errno(1) command can also be used to look up individual error numbers and names, and to  search  for  errors  using
       strings from the error description, as in the following examples:

           $ errno 2
           ENOENT 2 No such file or directory
           $ errno ESRCH
           ESRCH 3 No such process
           $ errno -s permission
           EACCES 13 Permission denied

   List of error names
       In the list of the symbolic error names below, various names are marked as follows:

       *  POSIX.1-2001:  The name is defined by POSIX.1-2001, and is defined in later POSIX.1 versions, unless otherwise indi‐
          cated.

       *  POSIX.1-2008: The name is defined in POSIX.1-2008, but was not present in earlier POSIX.1 standards.

       *  C99: The name is defined by C99.  Below is a list of the symbolic error names that are defined on Linux:

       E2BIG           Argument list too long (POSIX.1-2001).

       EACCES          Permission denied (POSIX.1-2001).

       EADDRINUSE      Address already in use (POSIX.1-2001).

       EADDRNOTAVAIL   Address not available (POSIX.1-2001).

       EAFNOSUPPORT    Address family not supported (POSIX.1-2001).

       EAGAIN          Resource temporarily unavailable (may be the same value as EWOULDBLOCK) (POSIX.1-2001).

       EALREADY        Connection already in progress (POSIX.1-2001).

       EBADE           Invalid exchange.

       EBADF           Bad file descriptor (POSIX.1-2001).

       EBADFD          File descriptor in bad state.

       EBADMSG         Bad message (POSIX.1-2001).

       EBADR           Invalid request descriptor.

       EBADRQC         Invalid request code.

       EBADSLT         Invalid slot.

       EBUSY           Device or resource busy (POSIX.1-2001).

       ECANCELED       Operation canceled (POSIX.1-2001).

       ECHILD          No child processes (POSIX.1-2001).

       ECHRNG          Channel number out of range.

       ECOMM           Communication error on send.

       ECONNABORTED    Connection aborted (POSIX.1-2001).

       ECONNREFUSED    Connection refused (POSIX.1-2001).

       ECONNRESET      Connection reset (POSIX.1-2001).

       EDEADLK         Resource deadlock avoided (POSIX.1-2001).

       EDEADLOCK       On most architectures, a synonym for EDEADLK.   On  some  architectures  (e.g.,  Linux  MIPS,  PowerPC,
                       SPARC), it is a separate error code "File locking deadlock error".

       EDESTADDRREQ    Destination address required (POSIX.1-2001).

       EDOM            Mathematics argument out of domain of function (POSIX.1, C99).

       EDQUOT          Disk quota exceeded (POSIX.1-2001).

       EEXIST          File exists (POSIX.1-2001).

       EFAULT          Bad address (POSIX.1-2001).

       EFBIG           File too large (POSIX.1-2001).

       EHOSTDOWN       Host is down.

       EHOSTUNREACH    Host is unreachable (POSIX.1-2001).

       EHWPOISON       Memory page has hardware error.

       EIDRM           Identifier removed (POSIX.1-2001).

       EILSEQ          Invalid or incomplete multibyte or wide character (POSIX.1, C99).

                       The  text  shown  here  is the glibc error description; in POSIX.1, this error is described as "Illegal
                       byte sequence".

       EINPROGRESS     Operation in progress (POSIX.1-2001).

       EINTR           Interrupted function call (POSIX.1-2001); see signal(7).

       EINVAL          Invalid argument (POSIX.1-2001).

       EIO             Input/output error (POSIX.1-2001).

       EISCONN         Socket is connected (POSIX.1-2001).

       EISDIR          Is a directory (POSIX.1-2001).

       EISNAM          Is a named type file.

       EKEYEXPIRED     Key has expired.

       EKEYREJECTED    Key was rejected by service.

       EKEYREVOKED     Key has been revoked.

       EL2HLT          Level 2 halted.

       EL2NSYNC        Level 2 not synchronized.

       EL3HLT          Level 3 halted.

       EL3RST          Level 3 reset.

       ELIBACC         Cannot access a needed shared library.

       ELIBBAD         Accessing a corrupted shared library.

       ELIBMAX         Attempting to link in too many shared libraries.

       ELIBSCN         .lib section in a.out corrupted

       ELIBEXEC        Cannot exec a shared library directly.

       ELNRANGE        Link number out of range.

       ELOOP           Too many levels of symbolic links (POSIX.1-2001).

       EMEDIUMTYPE     Wrong medium type.

       EMFILE          Too many open files (POSIX.1-2001).  Commonly caused by exceeding the RLIMIT_NOFILE resource limit  de‐
                       scribed in getrlimit(2).

       EMLINK          Too many links (POSIX.1-2001).

       EMSGSIZE        Message too long (POSIX.1-2001).

       EMULTIHOP       Multihop attempted (POSIX.1-2001).

       ENAMETOOLONG    Filename too long (POSIX.1-2001).

       ENETDOWN        Network is down (POSIX.1-2001).

       ENETRESET       Connection aborted by network (POSIX.1-2001).

       ENETUNREACH     Network unreachable (POSIX.1-2001).

       ENFILE          Too  many open files in system (POSIX.1-2001).  On Linux, this is probably a result of encountering the
                       /proc/sys/fs/file-max limit (see proc(5)).

       ENOANO          No anode.

       ENOBUFS         No buffer space available (POSIX.1 (XSI STREAMS option)).

       ENODATA         No message is available on the STREAM head read queue (POSIX.1-2001).

       ENODEV          No such device (POSIX.1-2001).

       ENOENT          No such file or directory (POSIX.1-2001).

                       Typically, this error results when a specified pathname does not exist, or one of the components in the
                       directory prefix of a pathname does not exist, or the specified pathname is a dangling symbolic link.

       ENOEXEC         Exec format error (POSIX.1-2001).

       ENOKEY          Required key not available.

       ENOLCK          No locks available (POSIX.1-2001).

       ENOLINK         Link has been severed (POSIX.1-2001).

       ENOMEDIUM       No medium found.

       ENOMEM          Not enough space/cannot allocate memory (POSIX.1-2001).

       ENOMSG          No message of the desired type (POSIX.1-2001).

       ENONET          Machine is not on the network.

       ENOPKG          Package not installed.

       ENOPROTOOPT     Protocol not available (POSIX.1-2001).

       ENOSPC          No space left on device (POSIX.1-2001).

       ENOSR           No STREAM resources (POSIX.1 (XSI STREAMS option)).

       ENOSTR          Not a STREAM (POSIX.1 (XSI STREAMS option)).

       ENOSYS          Function not implemented (POSIX.1-2001).

       ENOTBLK         Block device required.

       ENOTCONN        The socket is not connected (POSIX.1-2001).

       ENOTDIR         Not a directory (POSIX.1-2001).

       ENOTEMPTY       Directory not empty (POSIX.1-2001).

       ENOTRECOVERABLE State not recoverable (POSIX.1-2008).

       ENOTSOCK        Not a socket (POSIX.1-2001).

       ENOTSUP         Operation not supported (POSIX.1-2001).

       ENOTTY          Inappropriate I/O control operation (POSIX.1-2001).

       ENOTUNIQ        Name not unique on network.

       ENXIO           No such device or address (POSIX.1-2001).

       EOPNOTSUPP      Operation not supported on socket (POSIX.1-2001).

                       (ENOTSUP  and  EOPNOTSUPP  have  the  same  value on Linux, but according to POSIX.1 these error values
                       should be distinct.)

       EOVERFLOW       Value too large to be stored in data type (POSIX.1-2001).

       EOWNERDEAD      Owner died (POSIX.1-2008).

       EPERM           Operation not permitted (POSIX.1-2001).

       EPFNOSUPPORT    Protocol family not supported.

       EPIPE           Broken pipe (POSIX.1-2001).

       EPROTO          Protocol error (POSIX.1-2001).

       EPROTONOSUPPORT Protocol not supported (POSIX.1-2001).

       EPROTOTYPE      Protocol wrong type for socket (POSIX.1-2001).

       ERANGE          Result too large (POSIX.1, C99).

       EREMCHG         Remote address changed.

       EREMOTE         Object is remote.

       EREMOTEIO       Remote I/O error.

       ERESTART        Interrupted system call should be restarted.

       ERFKILL         Operation not possible due to RF-kill.

       EROFS           Read-only filesystem (POSIX.1-2001).

       ESHUTDOWN       Cannot send after transport endpoint shutdown.

       ESPIPE          Invalid seek (POSIX.1-2001).

       ESOCKTNOSUPPORT Socket type not supported.

       ESRCH           No such process (POSIX.1-2001).

       ESTALE          Stale file handle (POSIX.1-2001).

                       This error can occur for NFS and for other filesystems.

       ESTRPIPE        Streams pipe error.

       ETIME           Timer expired (POSIX.1 (XSI STREAMS option)).

                       (POSIX.1 says "STREAM ioctl(2) timeout".)

       ETIMEDOUT       Connection timed out (POSIX.1-2001).

       ETOOMANYREFS    Too many references: cannot splice.

       ETXTBSY         Text file busy (POSIX.1-2001).

       EUCLEAN         Structure needs cleaning.

       EUNATCH         Protocol driver not attached.

       EUSERS          Too many users.

       EWOULDBLOCK     Operation would block (may be same value as EAGAIN) (POSIX.1-2001).

       EXDEV           Improper link (POSIX.1-2001).

       EXFULL          Exchange full.

需要注意的是如果需要设置errno变量需要引入头文件errno.h。若发生错误了，使用perror函数即可打印相应的错误。如果想要看对应的错误指代的是什么字符串，可以使用strerror函数。函数原型为：

char *strerror(int errnum);

函数的参数为errno，返回值为该错误编号指代的错误信息。

1.7 文件示例1 读写文件

在这里使用Linux的系统调用函数的编写一个程序可以打开一个文件，使用write向文件中写入数据，再使用read函数将内容读出来。

// open的使用
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>


int main(int argc, char *argv[])
{
	printf("filename = [%s]\n", argv[1]);

	// 打开文件返回文件的文件描述符
	//int open(const char *pathname, int flags);
	//int open(const char *pathname, int flags, mode_t mode);
	int fd = open(argv[1], O_RDWR | O_CREAT, S_IRWXU | S_IRWXG | S_IRWXO);
	
	// 打开失败会返回-1
	if(fd < 0)
	{
		perror("file open error");
		return -1;
	}
	printf("fd = [%d]\n", fd);

	// 写文件
	//ssize_t write(int fd, const void *buf, size_t count);
	int size = write(fd, "hello world", strlen("hello world"));
	printf("write size = [%d]\n", size);

	// 移动文件指针到开始处
	//off_t lseek(int fd, off_t offset, int whence);
	off_t offset = lseek(fd, 0, SEEK_SET);
	printf("offset = [%lu]\n", offset);

	// 读文件
	//ssize_t read(int fd, void *buf, size_t count);
	char buf[128];
	memset(buf, 0, sizeof buf);
	size = read(fd, buf, sizeof buf);
	printf("read size = [%d]\n", size);
	printf("read = [%s]\n", buf);

	close(fd);

	return 0;
}

1.8 文件示例2 文件大小的计算

通过lseek函数去计算一个文件的大小。

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>

int main(int argc, char *argv[])
{
	int fd = open(argv[1], O_RDWR);
	if(fd < 0)
	{
		perror("file open error");
		return -1;
	}

	off_t size = lseek(fd, 0, SEEK_END);

	printf("[%s] size = [%ld]\n", argv[1], size);
	
	close(fd);
	return 0;
}

1.9 文件示例3 扩展文件大小

使用lseek函数使一个小文件扩展成大文件。方法为将文件指针移动到需要扩展大小的偏移处，再进行一次写操作即可。

#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <fcntl.h>


int main(int argc, char *argv[])
{
	int fd = open(argv[1], O_RDWR);
	if(fd < 0)
	{
		perror("file open error");
		return -1;
	}

	// 扩展到200字节大小
	off_t offset = lseek(fd, 200, SEEK_SET);
	
	// 进行一次写操作
	write(fd, "a", 1);

	close(fd);
	return 0;
}

1.10 文件示例4 perror函数的使用

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>

int main(int argc, char *argv[])
{
	// 打开文件
	int fd = open(argv[1], O_RDWR);
	if(fd < 0)
	{
		perror("open error");
		if(errno == ENOENT)
		{
			printf("same\n");
		}
		return -1;
	}

	int n = 0;
	for(n = 0; n < 64; n ++)
	{
		errno = n;
		printf("[%d]:[%s]\n", errno, strerror(errno));
	}

	close(fd);

	return 0;
}

1.11 阻塞与非阻塞的测试

在Linux中我们读取文件会有阻塞与非阻塞一说。那么我们如何判断这个阻塞和非阻塞是文件的特性还是read函数的特性呢？这里我们会使用read函数去读取不同类型的文件。如果读取多个类型的文件得到的都是阻塞或者非阻塞，则说明阻塞和非阻塞是read函数的特性；如果多个类型的文件得到的阻塞和非阻塞并不一样，那么说明阻塞和非阻塞是文件的特性，而不是read函数的特性。

使用read函数读取普通文件。

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <string.h>
#include <fcntl.h>

// 验证read汉书读普通文件是否阻塞
int main(int agrc, char *argv[])
{
	// 打开文件
	int fd = open(argv[1], O_RDWR);
	if(fd < 0)
	{
		perror("open error");
		return -1;
	}

	// 读文件
	char buf[1024];
	memset(buf, 0, 1024);
	int n = read(fd, buf, sizeof(buf));
	printf("first: n = [%d], buf = [%s]\n", n, buf);

	// 再次读文件，验证read函数是否阻塞
	memset(buf, 0, sizeof(buf));
	n = read(fd, buf, sizeof(buf));
	printf("second: n = [%d], buf = [%s]\n", n, buf);
	
	// 关闭文件
	close(fd);
	return 0;
}

用read读取设备文件：

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <string.h>
#include <fcntl.h>

// 验证read函数读设备文件是阻塞的
int main()
{
	// 标准输入
	char buf[1024];
	memset(buf, 0, sizeof(buf));
	int n = read(STDIN_FILENO, buf, sizeof(buf));
	printf("n = [%d], buf = [%s]\n", n, buf);

	return 0;
}

通过这两个例子的测试，我们可以得到阻塞和非阻塞是文件本身的属性，而不是read函数的属性。

2. 文件和目录

在上面的内容里，我们写了很多英文的内容。这些内容其实是Linux中为系统开发人员提供的帮助文档。这个帮助文档可以使用man命令进行查看。执行格式如下：

man 需要查看的内容
man 卷号 需要查看的内容

其中一共有9卷。默认不加卷号使用就是查看的第一次出现的卷号的位置，如果有多个卷都有相同的内容，则需要加卷号进行区分。这说说一下在系统编程中我们需要查询的一些卷对应的内容。首先卷1对应了可执行程序以及shell命令；卷2对应着系统调用；卷3对应着C语言库调用。其余的在C语言基础的Linux和Unix中就已经提及过。

在Linux系统编程这一节，我们需要进行大量使用man命令查阅开发文档，要学会如何查询开发文档以及使用开发文档进行编程，这一点是很重要的。在接下来的后续内容中，将不会再继续展示函数使用的开发文档，需要查看需要读者自行在Linux中执行man命令进行查阅。

2.1 文件操作相关函数

在文件操作的函数如下，

函数名	函数原型	函数参数	函数返回值	作用
stat	int stat(const char pathname, struct stat statbuf);	pathname: 文件路径 statbuf: 存储文件状态内存	成功返回0，失败返回-1并设置errno	将文件pathname的状态信息保存到statbuf中
lstat	int lstat(const char pathname, struct stat statbuf);	pathname: 文件路径 statbuf: 存储文件状态内存	成功返回0，失败返回-1并设置errno	将文件pathname的状态信息保存到statbuf中

这些函数的调用需要头文件sys/types.h、sys/stat.h、unistd.h。上面的struct stat的结构体定义如下：

           struct stat {
               dev_t     st_dev;         /* ID of device containing file */
               ino_t     st_ino;         /* Inode number */
               mode_t    st_mode;        /* File type and mode */
               nlink_t   st_nlink;       /* Number of hard links */
               uid_t     st_uid;         /* User ID of owner */
               gid_t     st_gid;         /* Group ID of owner */
               dev_t     st_rdev;        /* Device ID (if special file) */
               off_t     st_size;        /* Total size, in bytes */
               blksize_t st_blksize;     /* Block size for filesystem I/O */
               blkcnt_t  st_blocks;      /* Number of 512B blocks allocated */

               /* Since Linux 2.6, the kernel supports nanosecond
                  precision for the following timestamp fields.
                  For the details before Linux 2.6, see NOTES. */

               struct timespec st_atim;  /* Time of last access */
               struct timespec st_mtim;  /* Time of last modification */
               struct timespec st_ctim;  /* Time of last status change */

           #define st_atime st_atim.tv_sec      /* Backward compatibility */
           #define st_mtime st_mtim.tv_sec
           #define st_ctime st_ctim.tv_sec

           };

从上面的英文描述来看可以知道在st_mode成员中存储了文件的类型和权限管理，这些存储的信息都是依靠二进制位进行存储的。文件类型如下：

           S_IFMT     0170000   bit mask for the file type bit field

           S_IFSOCK   0140000   socket
           S_IFLNK    0120000   symbolic link
           S_IFREG    0100000   regular file
           S_IFBLK    0060000   block device
           S_IFDIR    0040000   directory
           S_IFCHR    0020000   character device
           S_IFIFO    0010000   FIFO

其中S_IFMT是文件类型的掩码，具体的文件类型需要使用st_mode & S_IFMT进行确定，得到的值时什么就对应这上述的文件类型，如判断一个文件是否为文件夹文件可以使用语句(st_mode & S_IFMT) == S_IFDIR。除此之外我们还有另外一种判断文件类型的宏函数，如下：

           S_ISREG(m)  is it a regular file?

           S_ISDIR(m)  directory?

           S_ISCHR(m)  character device?

           S_ISBLK(m)  block device?

           S_ISFIFO(m) FIFO (named pipe)?

           S_ISLNK(m)  symbolic link?  (Not in POSIX.1-1996.)

           S_ISSOCK(m) socket?  (Not in POSIX.1-1996.)

其中这里的m传入的就是st_mode。根据函数的真假来判断这个文件的具体类型。与第一种方法不同的是，第一种可以使用switch来进行判断，而这种方法只能使用if。如判断一个文件是否是块设备文件可以使用语句S_ISBLK(st_mode)。

在st_mode中，有属主、属组、其他人的各种权限，权限如下：

           S_IRWXU     00700   owner has read, write, and execute permission
           S_IRUSR     00400   owner has read permission

           S_IWUSR     00200   owner has write permission
           S_IXUSR     00100   owner has execute permission

           S_IRWXG     00070   group has read, write, and execute permission
           S_IRGRP     00040   group has read permission
           S_IWGRP     00020   group has write permission
           S_IXGRP     00010   group has execute permission

           S_IRWXO     00007   others (not in group) have read,  write,  and
                               execute permission
           S_IROTH     00004   others have read permission
           S_IWOTH     00002   others have write permission
           S_IXOTH     00001   others have execute permission

判断权限的时候只需要将st_mode与上述的权限进行与&操作，如果为真则表示有相应的权限。如判断属主是否有读权限可以使用语句st_mode & S_IRUSR。

一般来说，对于时间我们更倾向于使用后面的宏定义出来的st_atime、st_mtime、st_ctime，这些可以与以前的兼容。而这里的st_atim、st_mtime、st_ctime也可以使用，两者实际上是一样的，都是秒数。其中struct timespec的结构体定义如下：

           struct timespec {
               time_t tv_sec;        /* seconds */
               long   tv_nsec;       /* nanoseconds */
           };

最后需要注意的是虽然stat函数与lstat函数使用是一样的，甚至他们的作用都是一样的，但是两者对于链接文件还是有区别的。对于stat函数来说，调用之后得到的是链接文件指向文件的属性，而lstat调用之后得到的是链接文件本身的属性。当对普通文件进行操作的时候，两者是没有任何区别的。

接下来看一个关于stat函数的示例：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

// stat函数测试: 获取文件大小，文件属主和组
int main(int argc, char *argv[])
{
	// int stat(const char *pathname, struct stat *statbuf);
	struct stat st;
	stat(argv[1], &st);
	printf("uid = %d\n", st.st_uid);
	printf("gid = %d\n", st.st_gid);
	printf("size = %ld\n", st.st_size);
	printf("inode = %ld\n", st.st_ino);
	
	// 第一种方法判断文件类型
	switch(st.st_mode & S_IFMT)
	{
		case S_IFSOCK:
			printf("socket\n");
			break;
		case S_IFREG:
			printf("regular file\n");
			break;
		case S_IFLNK:
			printf("symbolic link\n");
			break;
		case S_IFBLK:
			printf("block device\n");
			break;
		case S_IFDIR:
			printf("directory\n");
			break;
		case S_IFCHR:
			printf("character device\n");
			break;
		case S_IFIFO:
			printf("FIFO\n");
			break;
		default:
			printf("unknown file\n");
	}
	
	// 第二种方法判断文件类型
	if(S_ISREG(st.st_mode))
	{
		printf("regular file\n");
	}

	if(S_ISDIR(st.st_mode))
	{
		printf("directory\n");
	}

	if(S_ISCHR(st.st_mode))
	{
		printf("character device\n");
	}

	if(S_ISBLK(st.st_mode))
	{
		printf("block device\n");
	}

	if(S_ISFIFO(st.st_mode))
	{
		printf("FIFO\n");
	}

	if(S_ISLNK(st.st_mode))
	{
		printf("symbolic link\n");
	}

	if(S_ISSOCK(st.st_mode))
	{
		printf("socket\n");
	}

	// 权限
	// 属主
	if(st.st_mode & S_IRUSR)
	{
		printf("r");
	}
	else
	{
		printf("-");
	}

	if(st.st_mode & S_IWUSR)
	{
		printf("w");
	}
	else
	{
		printf("-");
	}

	if(st.st_mode & S_IXUSR)
	{
		printf("x");
	}
	else
	{
		printf("-");
	}

	// 组
	if(st.st_mode & S_IRGRP)
	{
		printf("r");
	}
	else
	{
		printf("-");
	}

	if(st.st_mode & S_IWGRP)
	{
		printf("w");
	}
	else
	{
		printf("-");
	}

	if(st.st_mode & S_IXGRP)
	{
		printf("x");
	}
	else
	{
		printf("-");
	}

	// 其它人
	if(st.st_mode & S_IROTH)
	{
		printf("r");
	}
	else
	{
		printf("-");
	}

	if(st.st_mode & S_IWOTH)
	{
		printf("w");
	}
	else
	{
		printf("-");
	}

	if(st.st_mode & S_IXOTH)
	{
		printf("x\n");
	}
	else
	{
		printf("-\n");
	}

	return 0;
}

2.2 目录操作相关函数

目录操作的相关函数如下：

函数名	函数原型	函数参数	函数返回值	作用
opendir	DIR opendir(const char name);	name: 目录名	成功返回指向目录流的指针，失败返回NULL并设置errno	打开一个目录
readdir	struct dirent readdir(DIR dirp);	dirp: 目录流指针	返回一个指向目录结构的指针，失败返回NULL并设置errno	读取目录流的一个目录结构
closedir	int closedir(DIR *dirp);	dirp: 目录流指针	成功返回0，失败返回-1并设置errno	关闭目录

上面这些函数的调用需要头文件sys/types.h、dirent.h。其中结构体struct dirent的定义如下：

           struct dirent {
               ino_t          d_ino;       /* Inode number */
               off_t          d_off;       /* Not an offset; see below */
               unsigned short d_reclen;    /* Length of this record */
               unsigned char  d_type;      /* Type of file; not supported
                                              by all filesystem types */
               char           d_name[256]; /* Null-terminated filename */
           };

其中d_name是该文件的名字，d_type是文件类型。文件类型的值如下：

              DT_BLK      This is a block device.

              DT_CHR      This is a character device.

              DT_DIR      This is a directory.

              DT_FIFO     This is a named pipe (FIFO).

              DT_LNK      This is a symbolic link.

              DT_REG      This is a regular file.

              DT_SOCK     This is a UNIX domain socket.

              DT_UNKNOWN  The file type could not be determined.

关于目录操作的函数使用的示例代码如下。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <dirent.h>

int main(int argc, char *argv[])
{
	// 打开文件夹
	DIR *dir = opendir(argv[1]);
	if(dir == NULL)
	{
		perror("opendir error");
		return -1;
	}
	
	// 读取文件夹内容
	struct dirent *ds = NULL;
	while((ds = readdir(dir)) != NULL)
	{
		printf("filename: [%s] ", ds->d_name);
		
		// 文件类型判断
		if(ds->d_type == DT_BLK)
		{
			printf("This is a block device!\n");
		}
		else if(ds->d_type == DT_CHR)
		{
			printf("This is a character device!\n");
		}
		else if(ds->d_type == DT_DIR)
		{
			printf("This is a derectory!\n");
		}
		else if(ds->d_type == DT_FIFO)
		{
			printf("This is a named pipe!\n");
		}
		else if(ds->d_type == DT_LNK)
		{
			printf("This is a symbolic link!\n");
		}
		else if(ds->d_type == DT_REG)
		{
			printf("This is a regular file!\n");
		}
		else if(ds->d_type == DT_SOCK)
		{
			printf("This is a UNIX domain socket!\n");
		}
		else
		{
			printf("The file type could not be determined!\n");
		}

	}

	return 0;
}

2.3 dup/dup2/fcntl函数

dup和dup2主要用于复制文件描述符，而fcntl不仅可以复制文件描述符，也可以获取文件的flags并且设置flags。其中flags是打开文件open函数的第二个参数。这些函数的原型如下：

函数名	函数原型	函数参数	函数返回值	作用
dup	int dup(int oldfd);	oldfd: 需要复制的文件描述符	新的文件描述符，失败返回-1并设置errno	复制文件描述符
dup2	int dup2(int oldfd, int newfd);	oldfd: 旧文件描述符 newfd: 新文件描述符	成功返回新的文件描述符即newfd，失败返回-1并设置errno	复制文件描述符并指定为newfd
fcntl	int fcntl(int fd, int cmd, … /* arg */ );	fd: 文件描述符 cmd: 需要进行的操作 … ：参数取决于cmd	根据cmd不同返回值不一样	复制文件描述符，获取文件flags，设置flags等等，功能强大

在上面的函数中需要使用头文件unistd.h，其中fcntl函数需要多加一个fcntl.h头文件。其中fcntl函数的中常用cmd如下。

cmd	作用	函数返回值
F_DUPFD	复制文件描述符	成功返回文件描述符，失败返回-1并设置errno
F_GETFL	获取文件flags	成功返回文件flags，失败返回-1并设置errno
F_SETFL	设置文件flags	成功返回0，失败返回-1并设置errno

关于这个函数的cmd参数还有非常多，想要了解可以使用man fcntl进行查阅相关文档。

常见的fcntl操作如下：

// 1 复制一个新的文件描述符:
int newfd = fcntl(fd, F_DUPFD, 0);
// 2 获取文件的属性标志
int flag = fcntl(fd, F_GETFL, 0)
// 3 设置文件状态标志
flag = flag | O_APPEND;
fcntl(fd, F_SETFL, flag)

复制文件描述符使的工作原理如下：

在这里插入图片描述

可以看到实际上是多个文件描述符指向同一个文件，此时我们对其中的一个文件描述符使用close操作的时候并不能真正关闭文件，需要所有的文件描述符都调用close才能真正关闭文件。由于多个文件描述符操作一个文件，所以都是共用的第一个文件指针。

在dup2中我们可以指定文件描述符，所以我们可以实现文件输出或者输入的重定向操作。

下面来看一些关于这些函数的例子。

关于dup函数的使用。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>

int main(int argc, char *argv[])
{
	// 打开文件
	int fd = open(argv[1], O_RDWR);
	if(fd < 0)
	{
		perror("open error");
		return -1;
	}

	// 复制文件描述符
	int newfd = dup(fd);
	// 写文件
	write(fd, "helloworld", strlen("helloworld"));
	// 移动文件指针到文件开头
	lseek(fd, 0, SEEK_SET);
	// 使用newfd读文件
	char buf[1024];
	memset(buf, 0x00, sizeof buf);
	read(fd, buf, sizeof(buf));
	printf("%s\n", buf);

	close(fd);
	close(newfd);
	return 0;
}

关于dup2函数的使用。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>

int main(int argc, char *argv[])
{
	int fd = open(argv[1], O_RDWR);
	if(fd < 0)
	{
		perror("open error");
		return -1;
	}

	int newfd = 3;
	dup2(newfd, fd);
	
	// 向fd中写数据
	write(fd, "nihaoya,damahou", strlen("nihaoya,damahou"));
	lseek(fd, 0, SEEK_SET);

	// 读newfd的数据
	char buf[1024];
	memset(buf, 0x00, sizeof buf);
	read(fd, buf, sizeof buf);
	
	printf("buf = %s\n", buf);

	close(fd);
	close(newfd);

	return 0;
}

关于dup2函数的重定向使用。

// 实现文件重定向
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>

int main(int argc, char *argv[])
{
	// 打开文件
	int fd = open(argv[1], O_RDWR | O_CREAT, 0777);
	if(fd < 0)
	{
		perror("open error");
		return -1;
	}
	// 重定向输出
	dup2(fd, STDOUT_FILENO);

	printf("老铁6666\n");
	printf("老铁NB Plus\n");
	printf("大马猴，奥利给\n");

	close(fd);

	return 0;
}

关于fcntl函数的使用。

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>

int main(int argc, char *argv[])
{
	// 打开文件
	int fd = open(argv[1], O_RDWR);
	if(fd < 0)
	{
		perror("open error");
		return -1;
	}

	// 获得和设置flags属性
	int flags = fcntl(fd, F_GETFL, 0);
	flags = flags | O_APPEND;
	fcntl(fd, F_SETFL, flags);
	
	// 写文件
	write(fd, "hello world", strlen("hello world"));

	// 关闭文件
	close(fd);

	return 0;
}