深入理解计算机系统——第十一章 Network Programming

11.1 The Client-Server Programming Model
11.2 Networks
11.3 The Global IP Internet
- 11.3.1 IP Addresses
- 11.3.2 Internet Domain Names
- 11.3.3 Internet Connections
11.4 The Sockets Interface
- 11.4.1 Socket Address Structures
- 11.4.2 The socket Function
- 11.4.3 The connect Function
- 11.4.4 The bind Function
- 11.4.5 The listen Function
- 11.4.6 The accept Function
- 11.4.7 Host and Service Conversion
- - The getaddrinfo Function
  - The getnameinfo Function
11.5 Web Servers
- 11.5.1 Web Basics
- 11.5.2 Web Content
- 11.5.3 HTTP Transactions
- 11.5.4 Serving Dynamic Content

资源：

视频课程
视频课件 1
视频课件 2
解读《深入理解计算机系统(CSAPP)》第11章网络编程
深入理解计算机系统第11章笔记整理
[读书笔记]CSAPP：25[VB]网络编程1
IP地址和MAC地址的区别和联系是什么？

11.1 The Client-Server Programming Model

每个网络应用都是依据客户端-服务器模型，一个应用由一个服务器进程和一个或多个客户端进程组成。

一个服务器管理某种资源，并通过操作这种资源为客户端提供某种服务。

客户端-服务器模型的基本操作是事务（transaction），由下面四步组成：

客户端需要服务时，通过向服务器发送请求来初始化一个事务。
服务器接收到请求后，解释（interpret）它，然后以合适的方式操作其资源。
服务器向客户端发送响应（response），然后等待下一个请求。
客户端接收响应并进行处理。

fig 11.1

注意：客户端和服务器是进程而不是机器。

11.2 Networks

客户端和服务器通常运行在分开的主机（host）上，通过计算机网络（computer network）的硬件和软件资源来交流。

对于主机而言，网络就是一种 I/O 设备，充当数据源和数据接收方。

fig 11.2

从上图可以看出，适配器（adapter）是插在 I/O 总线的扩展插槽（expansion slots）中，为网络（network）提供物理接口。
从网络接收的数据经过网络适配器，I/O 总线和内存总线最终复制到主存中，即通过 DMA 传输方式（见 6.1.1 节）。
数据也能从内存复制到网络。

从物理层面上说，网络是依据地理远近组成的层次系统（hierarchical system）：

network

以太网段（Ethernet segment）：

以太网段由一些电缆（通常为双绞线（twisted pairs of wires））和一个被称为集线器（hub）的小盒子组成，见下图所示：
以太网段通常覆盖的面积的范围很小，如一个房间或一栋建筑的一层。
每个电缆有相同大小的带宽（bandwidth），通常为 100 Mb/s 或 1 Gb/s。
电缆的一端连接到一个主机的适配器上，另一端连到集线器的一个端口上。
集线器会将每个端口接收到的数据会复制给其他的所有端口，因此与集线器相连的每个主机能看到集线器上传输的所有数据。
每个以太网适配器有一个全球唯一的 48 位地址（MAC address），该地址存在适配器的非易失性存储器上。
A host can send a chunk of bits called a frame to any other host on the segment.
- Each frame includes some fixed number of header bits that identify the source and destination of the frame and the frame length, followed by a payload of data bits.
- Every host adapter sees the frame, but only the destination host actually reads it.

桥接以太网段（Bridged Ethernet Segment）：
fig 11.4

通过一些电缆和被称为网桥（bridges）的小盒子，多个以太网段能连接成更大的局域网（LAN），称为桥接以太网（bridged Ethernets）。
桥接以太网能覆盖更大的范围，如一栋建筑或学校。
网桥比集线器能更好的利用带宽，它不会无差别的复制数据给所有的集线器，而是通过自学习后有选择的复制数据到对应的端口。例如主机 A 发送帧给主机 B，则网桥 X 会丢掉该帧的数据；如果主机 A 发送帧给主机 C，则网桥 X 复制帧给网桥 Y，然后网桥 Y 只会将帧复制给主机 C 所在的集线器端口。

互联网络（internet）：
fig 11.6

多个不兼容的局域网可以通过一种被称为路由器（router）的计算机连接起来形成互联网络（internet）。
每个路由器对于它所连接的每个网络都有一个适配器（端口）。
路由器也能连接高速点到点电话连接，这种是一种被称为广域网（wide area networks）的例子。

注意：
Internet versus internet

The crucial property of an internet is that it can consist of different LANs and WANs with radically different and incompatible technologies.

协议软件（protocol software）： 协议软件运行在每个主机和路由器上，通过一种协议（protocol）来处理不同网络之间的差异。

协议（protocol）： 一系列规则，用来管理不同的主机和路由器在网络之间如何传输数据。

Provides a naming scheme
- An internet protocol defines a uniform format for host addresses.
- Each host (router) is then assigned at least one of these internet addresses that uniquely identifies it.
Provides a delivery mechanism
- An internet protocol defines a standard transfer unit (packet).
- Packet consists of header and payload.
  - Header: contains info such as packet size, source and destination address.
  - Payload: contains data bits sent from source host.

下图 11.7 展示主机和路由器怎么通过互联网协议在不兼容的局域网之间传输数据：
fig 11.7

运行在主机 A 的客户端连接的是局域网 LAN1，需要给连接在局域网 LAN2 上的主机 B 的服务器发送数据，该过程如下：

主机 A 上的客户端通过系统调用从客户端的虚拟地址空间复制数据到内核的缓冲区。
主机 A 上的协议软件为数据添加一个 internet header（PH）和 LAN1 帧头（LAN1 frame header）从而组成一个 LAN1 frame，再将该 LAN1 frame 传输给 LAN1 适配器（adapter）。
- internet header 记录主机 B 的 IP 地址，LAN1 frame header 记录路由器的 MAC 地址。
- LAN1 frame 的 payload 就是一个 internet packet，而该 internet packer 的 payload 是实际传输的数据 data。
LAN1 适配器将该帧复制到网络上。
当帧到达路由器时，路由器的 LAN1 适配器读该帧的数据然后传送给 protocol software。
路由器根据帧中的 internet packet header (PH) 中获取 destination internet address（主机 B 的 IP 地址），然后将其作为路由表（routing table）的索引来决定将这个包（packet）转发到哪个端口，此例中为 LAN2。路由器去掉旧的帧头 FH1，然后添加新的帧头 FH2，该帧头记录主机 B 的 MAC 地址，然后将该帧传递给 LAN2 适配器。
路由器的 LAN2 适配器将帧复制到网络。
当帧达到主机 B后，主机 B的适配器读取该帧数据然后传送给协议软件。
主机 B的协议软件剥离帧头 FH2 和包头 PH2，当服务器有读这些数据的系统调用时，协议软件将最终的数据复制到服务器的虚拟地址空间。

11.3 The Global IP Internet

Figure 11.8 shows the basic hardware and software organization of an Internet client-server application.

fig 11.8

11.3.1 IP Addresses

IP 地址是无符号 32 位整数。
TCP/IP 定义了统一的网络字节序，对于 IP 地址等整型数据采用大端字节序。
IP 地址通常用点分十进制（dotted-decimal notation）表示，如 128.2.194.242。
IP 地址的结构如下：
主机字节序和网络字节序之间的转换如下：
IP 地址点分十进制字符串和二进制网络字节序之间转换函数：

函数名 _ 后面的 n 表示 network，p 表示 presentation，这两个函数能操作 32 位的 IPv4 地址（AF_INET）或者 128 位的 IPv6 地址（AF_INET6）。
inet_pton 将点分十进制的字符串（src）转换为二进制的网络字节序（dst）。
inet_ntop 将二进制的网络字节序（src）转换为点分十进制的字符串（dst）。

11.3.2 Internet Domain Names

互联网定义便于记忆的域名（domain name）来代替难记忆的 IP 地址，并通过域名系统 DNS（domain name system）来管理域名和 IP 地址之间的映射，DNS 是一个分布式的数据库系统。

域名采用层次树状结构来命名，不区分大小写，其结构如下：

在这里插入图片描述

每个主机都有一个本地定义的域名 localhost，总是映射为回送地址（lookback address）127.0.0.1。

11.3.3 Internet Connections

Internet clients and servers communicate by sending and receiving streams of bytes over connections. Each connection is:

Point-to-point
连接一对进程。
Full-duplex
全双工模式，数据可以同时双向流动，同时发送和接收数据。
Reliable
源进程发送的数据基本能以被目的进程按照相同的字节序接收。

连接的末端就是套接字（socket）。

套接字的地址格式为：IP 地址 : 端口。（端口见计算机网络-谢希仁-第7版第5章运输层 5.09）
连接由其两端的套接字地址唯一的确定，这一对套接字称为套接字对（socket pair），格式为 (cliaddr:cliport, servaddr:servport)，cliaddr 为客户端 IP 地址。

11.4 The Sockets Interface

The sockets interface is a set of functions that are used in conjunction with the Unix I/O functions to build network applications.

fig 11.12

11.4.1 Socket Address Structures

套接字：

从不同角度看套接字

对于内核，套接字是通信的端点。
对于应用程序，套接字是一个有相应描述符的打开文件。

客户端和服务器通过读写套接字描述符来通信
套接字和普通的 I/O 文件的区别
The main distinction between regular file I/O and socket I/O is how the application “open” the socket descriptors.

套接字地址结构：

Generic socket address
总共 16 字节，前两字节指明协议类型。
IP socket address structure
总共 16 字节，前 2 字节协议类型为 AF_INET，表示使用 IPV4 协议；sin_port 占两个字节，表示端口号，大端字节序；sin_addr 占 4 字节，表示 IP 地址；最后的 8 个字节用来填充。
sockadd_in 可以看成 sock_addr 的子类。
IP 地址和端口号在网络中都是大端字节序。

11.4.2 The socket Function

Clients and servers use the socket function to create a socket descriptor.

在这里插入图片描述

clientfd = Socket(AF_INET, SOCK_STREAM, 0);

AF_INET 表名使用 32 位 IP 地址。
SOCK_STREAM 表名套接字将是连接的端点。
返回的 clientfd 描述符还不能用来读写。

11.4.3 The connect Function

客户端通过 connect 函数和服务器建立连接。

在这里插入图片描述

addr 为服务器套接字地址。
addrlen 为 sizeof(sockaddr_in)。
该函数会阻塞直到连接成功或出错。
如果连接建立成功，clientfd 则可以进行读写。
连接成功，则返回值为套接字对：(x:y, addr.sin_addr:addr.sin_port)，其中 x 为客户端 IP 地址，y 为客户端的短暂端口号。

11.4.4 The bind Function

The remaining sockets functions—bind, listen, and accept—are used by servers to establish connections with clients.

在这里插入图片描述

bind 函数要求内核将服务器的套接字地址 add 和套接字描述符 sockfd 关联起来。
addrlen 为 sizeof(sockaddr_in)。

11.4.5 The listen Function

默认情况下，内核认为由 socket 函数创建的描述符为主动套接字（active socket），将作为客户端的套接字。

服务器可以调用 listen 函数来通知内核该描述符将被服务器使用。

在这里插入图片描述

listen 函数将 sockfd 从主动套接字转换为监听套接字（listening socket），用来接收客户端发送的连接请求。
The backlog argument is a hint about the number of outstanding connection requests that the kernel should queue up before it starts to refuse requests.

11.4.6 The accept Function

服务器通过调用 accept 函数来等待客户端的连接请求。

在这里插入图片描述

accept 函数等待来自客户端的连接请求到达监听描述符 listenfd，然后将客户端的套接字地址填写到 addr 中，最后返回一个已连接的描述符用来和客户端通信。

监听描述符和连接描述符的区别：

监听描述符是客户端请求连接的末端。
监听描述符只创建一次，存在于服务器的整个生命周期。
连接描述符是客户端和服务器创建的连接的末端。
连接描述符在每次服务器接收一个连接请求时创建，只在服务器为客户端提供服务的过程中存在。

监听和连接描述符的关系见下图：

在这里插入图片描述

服务器调用 accept 函数来等待连接请求到达监听描述符。
客户端调用 connect 函数，向 listenfd 发送连接请求。
服务器调用 accecpt 函数打开一个新的已连接的描述符 connfd ，在 clientfd 和 connfd 之间创建连接，返回 connfd 给应用程序。客户端和服务器能通信。

11.4.7 Host and Service Conversion

Linux provides some powerful functions, called getaddrinfo and getnameinfo, for converting back and forth between binary socket address structures and the string representations of hostnames, host addresses, service names, and port numbers.

The getaddrinfo Function

The getaddrinfo function converts string representations of hostnames, host addresses, service names, and port numbers into socket address structures.

host 参数是域名或者数字形式的地址（如点分十进制的 IP 地址），可以为空 NULL。
service 参数是十进制的端口号或者服务名（如 http 协议），可以为空 NULL。
host 和 service 不能同时为空 NULL。
hint 是可选参数，是一个指向 addrinfo 结构体的指针，结构体如下，通常使用时，通过 memset 先将整个结构体清空，然后设置几个选择的字段。：
Given host and service (the two components of a socket address), getaddrinfo returns a result that points to a linked list of addrinfo structures, each of which points to a socket address structure that corresponds to host and service.
Clients: walk this list, trying each socket address in turn, untill the calls to socket and connect succeed.
Servers: walk the list untill calls to socket and bind succeed.
By default, getaddrinfo can return both IPv4 and IPv6 socket addresses. Setting ai_family to AF_INET restricts the list to IPv4 addresses. Setting it to AF_INET6 restricts the list to IPv6 addresses.
Helper functions:

freeadderinfo frees the entire linked list.
gai_strerror converts error code to an error message.

The getnameinfo Function

The getnameinfo function is the inverse of getaddrinfo. It converts a socket address structure to the corresponding host and service name strings.

It is the modern replacement for the obsolete gethostbyaddr and getservbyport functions.
Unlike those functions, it is reentrant and protocol-independent.

示例：

fig 11.7

11.5 Web Servers

11.5.1 Web Basics

Web clients and servers interact using a text-based application-level protocol known as HTTP (hypertext transfer protocol).
Web content can be written in a language known as HTML (hypertext markup language).

11.5.2 Web Content

To Web clients and servers, content is a sequence of bytes with an associated MIME (multipurpose internet mail extensions) type.
Web servers provide content to clients in two different ways:
Every piece of content returned by a Web server is associated with some file that it manages. Each of these files has a unique name known as a URL (universal resource locator).
- Clients and servers use different parts of the URL during a transaction.
- URLs for executable files can include program arguments after the filename.
  A ‘?’ character separates the filename from the arguments, and each argument is separated by an ‘&’ character.
  For example, the URL http://bluefish.ics.cs.cmu.edu:8000/cgi-bin/adder?15000&213 identifies an executable called /cgi-bin/adder that will be called with two argument strings: 15000 and 213.

11.5.3 HTTP Transactions

HTTP Requests
An HTTP request consists of a request line, followed by zero or more request headers, followed by an empty text line that terminates the list of headers.

Request line
- Method
- URI
  uri和url的区别与联系
- Version
  HTTP 版本，HTTP/1.0 或 HTTP/1.1。
request headers

计算机网络谢希仁第7版

HTTP Responses
An HTTP response consists of a response line, followed by zero or more response headers, followed by an empty line that terminates the headers, followed by the response body.