read and write
The read and write methods both perform a similar task, that is, copying data from and to application code. Therefore, their prototypes are pretty similar, and it's worth introducing them at the same time: read 和 write 方法都执行类似的任务,即从应用程序代码复制数据或将数据复制到应用程序代码。 因此,它们的原型非常相似,值得同时介绍它们:
ssize_t read(struct file *filp, char _ _user *buff, size_t count, loff_t *offp);
ssize_t write(struct file *filp, const char _ _user *buff, size_t count, loff_t *offp);
For both methods, filp is the file pointer and count is the size of the requested data transfer. The buff argument points to the user buffer holding the data to be written or the empty buffer where the newly read data should be placed. Finally, offp is a pointer to a "long offset type" object that indicates the file position the user is accessing. The return value is a "signed size type"; its use is discussed later. 对于这两种方法,filp 是文件指针,count 是请求数据传输的大小。 buff 参数指向保存要写入数据的用户缓冲区或应该放置新读取数据的空缓冲区。 最后,offp 是指向“长偏移类型”对象的指针,该对象指示用户正在访问的文件位置。 返回值为“有符号大小类型”; 它的用途将在后面讨论。
Let us repeat that the buff argument to the read and write methods is a user-space pointer. Therefore, it cannot be directly dereferenced by kernel code. There are a few reasons for this restriction: 让我们重复一遍,read 和 write 方法的 buff 参数是一个用户空间指针。 因此,它不能被内核代码直接取消引用。 这种限制有几个原因:
Depending on which architecture your driver is running on, and how the kernel was configured, the user-space pointer may not be valid while running in kernel mode at all. There may be no mapping for that address, or it could point to some other, random data. 根据您的驱动程序运行的架构以及内核的配置方式,用户空间指针在内核模式下运行时可能根本无效。 该地址可能没有映射,或者它可能指向其他一些随机数据。
Even if the pointer does mean the same thing in kernel space, user-space memory is paged, and the memory in question might not be resident in RAM when the system call is made. Attempting to reference the user-space memory directly could generate a page fault, which is something that kernel code is not allowed to do. The result would be an "oops," which would result in the death of the process that made the system call. 即使指针在内核空间中确实表示相同的意思,用户空间内存也是分页的,并且在进行系统调用时,所讨论的内存可能不会驻留在 RAM 中。 尝试直接引用用户空间内存可能会产生页面错误,这是内核代码不允许做的事情。 结果将是“oops”,这将导致进行系统调用的进程死亡。
The pointer in question has been supplied by a user program, which could be buggy or malicious. If your driver ever blindly dereferences a user-supplied pointer, it provides an open doorway allowing a user-space program to access or overwrite memory anywhere in the system. If you do not wish to be responsible for compromising the security of your users' systems, you cannot ever dereference a user-space pointer directly. 有问题的指针由用户程序提供,可能是错误的或恶意的。 如果您的驱动程序曾经盲目地取消引用用户提供的指针,它提供了一个开放的入口,允许用户空间程序访问或覆盖系统中任何地方的内存。 如果您不希望对损害用户系统的安全性负责,则永远不能直接取消引用用户空间指针。
Obviously, your driver must be able to access the user-space buffer in order to get its job done. This access must always be performed by special, kernel-supplied functions, however, in order to be safe. We introduce some of those functions (which are defined in <asm/uaccess.h>) here, and the rest in the Section 6.1.4; they use some special, architecture-dependent magic to ensure that data transfers between kernel and user space happen in a safe and correct way. 显然,您的驱动程序必须能够访问用户空间缓冲区才能完成其工作。 然而,为了安全起见,这种访问必须始终由内核提供的特殊函数执行。 我们在这里介绍其中的一些函数(在 <asm/uaccess.h> 中定义),其余的在第 6.1.4 节中介绍; 他们使用一些特殊的、依赖于架构的魔法来确保内核和用户空间之间的数据传输以安全和正确的方式进行。
The code for read and write in scull needs to copy a whole segment of data to or from the user address space. This capability is offered by the following kernel functions, which copy an arbitrary array of bytes and sit at the heart of most read and write implementations: scull 中的读写代码需要将整个数据段复制到用户地址空间或从用户地址空间复制。 此功能由以下内核函数提供,它们复制任意字节数组并位于大多数读写实现的核心:
unsigned long copy_to_user(void _ _user *to, const void *from, unsigned long count);
unsigned long copy_from_user(void *to, const void _ _user *from, unsigned long count);
Although these functions behave like normal memcpy functions, a little extra care must be used when accessing user space from kernel code. The user pages being addressed might not be currently present in memory, and the virtual memory subsystem can put the process to sleep while the page is being transferred into place. This happens, for example, when the page must be retrieved from swap space. The net result for the driver writer is that any function that accesses user space must be reentrant, must be able to execute concurrently with other driver functions, and, in particular, must be in a position where it can legally sleep. We return to this subject in Chapter 5. 尽管这些函数的行为类似于普通的 memcpy 函数,但在从内核代码访问用户空间时必须格外小心。 正在寻址的用户页面当前可能不存在于内存中,并且虚拟内存子系统可以在页面被传输到位时使进程进入睡眠状态。 例如,当必须从交换空间中检索页面时,就会发生这种情况。 驱动程序编写者的最终结果是任何访问用户空间的函数都必须是可重入的,必须能够与其他驱动程序函数同时执行,特别是必须处于可以合法休眠的位置。 我们将在第 5 章回到这个主题。
The role of the two functions is not limited to copying data to and from user-space: they also check whether the user space pointer is valid. If the pointer is invalid, no copy is performed; if an invalid address is encountered during the copy, on the other hand, only part of the data is copied. In both cases, the return value is the amount of memory still to be copied. The scull code looks for this error return, and returns -EFAULT to the user if it's not 0. 这两个函数的作用不仅限于向用户空间复制数据和从用户空间复制数据:它们还检查用户空间指针是否有效。 如果指针无效,则不进行复制; 另一方面,如果在复制过程中遇到无效地址,则仅复制部分数据。 在这两种情况下,返回值都是仍要复制的内存量。 scull 代码查找此错误返回,如果不为 0,则返回 -EFAULT 给用户。
The topic of user-space access and invalid user space pointers is somewhat advanced and is discussed in Chapter 6. However, it's worth noting that if you don't need to check the user-space pointer you can invoke _ _copy_to_user and _ _copy_from_user instead. This is useful, for example, if you know you already checked the argument. Be careful, however; if, in fact, you do not check a user-space pointer that you pass to these functions, then you can create kernel crashes and/or security holes. 用户空间访问和无效用户空间指针的主题有些高级,将在第 6 章讨论。然而,值得注意的是,如果您不需要检查用户空间指针,您可以调用 _ _copy_to_user 和 _ _copy_from_user . 这很有用,例如,如果您知道您已经检查了参数。 但是要小心; 事实上,如果您不检查传递给这些函数的用户空间指针,那么您可能会造成内核崩溃和/或安全漏洞。
As far as the actual device methods are concerned, the task of the read method is to copy data from the device to user space (using copy_to_user), while the write method must copy data from user space to the device (using copy_from_user). Each read or write system call requests transfer of a specific number of bytes, but the driver is free to transfer less data—the exact rules are slightly different for reading and writing and are described later in this chapter. 就实际的设备方法而言,read 方法的任务是将数据从设备复制到用户空间(使用 copy_to_user),而 write 方法必须将数据从用户空间复制到设备(使用 copy_from_user)。 每个 read 或 write 系统调用都请求传输特定数量的字节,但驱动程序可以自由传输更少的数据——读取和写入的确切规则略有不同,本章稍后将介绍。
Whatever the amount of data the methods transfer, they should generally update the file position at *offp to represent the current file position after successful completion of the system call. The kernel then propagates the file position change back into the file structure when appropriate. The pread and pwrite system calls have different semantics, however; they operate from a given file offset and do not change the file position as seen by any other system calls. These calls pass in a pointer to the user-supplied position, and discard the changes that your driver makes. 无论方法传输的数据量是多少,它们通常都应该在成功完成系统调用后更新 *offp 处的文件位置以表示当前文件位置。 然后内核在适当的时候将文件位置更改传播回文件结构。 然而,pread 和 pwrite 系统调用有不同的语义。 它们从给定的文件偏移量进行操作,并且不会更改任何其他系统调用所看到的文件位置。 这些调用传入一个指向用户提供位置的指针,并丢弃驱动程序所做的更改。
Figure 3-2 represents how a typical read implementation uses its arguments. 图 3-2 展示了典型的读取实现如何使用其参数。
Figure 3-2. The arguments to read
Both the read and write methods return a negative value if an error occurs. A return value greater than or equal to 0, instead, tells the calling program how many bytes have been successfully transferred. If some data is transferred correctly and then an error happens, the return value must be the count of bytes successfully transferred, and the error does not get reported until the next time the function is called. Implementing this convention requires, of course, that your driver remember that the error has occurred so that it can return the error status in the future. 如果发生错误,read 和 write 方法都返回负值。 相反,大于或等于 0 的返回值告诉调用程序已成功传输了多少字节。 如果某些数据传输正确,然后发生错误,则返回值必须是成功传输的字节数,直到下次调用该函数时才会报告错误。 当然,实现这个约定需要你的驱动程序记住错误已经发生,以便它可以在将来返回错误状态。
Although kernel functions return a negative number to signal an error, and the value of the number indicates the kind of error that occurred (as introduced in Chapter 2), programs that run in user space always see -1 as the error return value. They need to access the errno variable to find out what happened. The user-space behavior is dictated by the POSIX standard, but that standard does not make requirements on how the kernel operates internally. 尽管内核函数返回一个负数来表示错误,并且该数字的值表示发生的错误类型(如第 2 章所述),但在用户空间中运行的程序总是将 -1 视为错误返回值。 他们需要访问 errno 变量以了解发生了什么。 用户空间行为由 POSIX 标准规定,但该标准并未对内核如何在内部运行做出要求。