[linux 驱动]块设备驱动详解与实战

1 描述

2 结构体

2.1 block_device_operations

2.2 gendisk

2.3 block_device

2.4 request_queue

2.5 request

2.6 bio

3.7 blk_mq_tag_set

3.8 blk_mq_ops

3 相关函数

3.1 注册注销块设备

3.1.1 register_blkdev

3.1.2 unregister_blkdev

3.2 gendisk 结构体操作

3.2.1 alloc_disk

3.2.2 set_capacity

3.2.3 add_disk

3.2.4 del_gendisk

3.3 块设备 I/O 请求

3.3.1 blk_mq_alloc_tag_set

3.3.2 blk_mq_free_tag_set

3.3.3 blk_mq_init_queue

3.3.4 blk_cleanup_queue

3.4 请求 request

3.4.1 blk_mq_start_request

3.4.1 blk_mq_end_request

3.5 数据操作

3.5.1 bio_data

3.5.2 rq_data_dir

4 实验

4.1 代码

4.2 操作

1 描述

块设备是针对存储设备的，比如 SD 卡、EMMC、NAND Flash、Nor Flash、SPI Flash、机械硬盘、固态硬盘等。因此块设备驱动其实就是这些存储设备驱动，块设备驱动相比字符设备驱动的主要区别如下：

①块设备只能以块为单位进行读写访问，块是 linux 虚拟文件系统(VFS)基本的数据传输单位。字符设备是以字节为单位进行数据传输的，不需要缓冲。

②块设备在结构上是可以进行随机访问的，对于这些设备的读写都是按块进行的，块设备使用缓冲区来暂时存放数据，等到条件成熟以后再一次性将缓冲区中的数据写入块设备中。这么做的目的为了提高块设备寿命，大家如果仔细观察的话就会发现有些硬盘或者 NAND Flash 就会标明擦除次数(flash 的特性，写之前要先擦除)，比如擦除 100000 次等。因此，为了提高块设备寿命而引入了缓冲区，数据先写入到缓冲区中，等满足一定条件后再一次性写入到真正的物理存储设备中，这样就减少了对块设备的擦除次数，提高了块设备寿命。

字符设备是顺序的数据流设备，字符设备是按照字节进行读写访问的。字符设备不需要缓冲区，对于字符设备的访问都是实时的，而且也不需要按照固定的块大小进行访问。

块设备结构的不同其 I/O 算法也会不同，比如对于 EMMC、SD 卡、NAND Flash 这类没有任何机械设备的存储设备就可以任意读写任何的扇区(块设备物理存储单元)。但是对于机械硬盘这样带有磁头的设备，读取不同的盘面或者磁道里面的数据，磁头都需要进行移动，因此对于机械硬盘而言，将那些杂乱的访问按照一定的顺序进行排列可以有效提高磁盘性能，linux 里面针对不同的存储设备实现了不同的 I/O 调度算法。

Linux MMC/SD存储卡是一种典型的块设备，它的实现位于drivers/mmc。drivers/mmc下又分为card、core和host这3个子目录。card实际上跟Linux的块设备子系统对接，实现块设备驱动以及完成请求，但是具体的协议经过core层的接口，最终通过host完成传输，因此整个MMC子系统的框架结构如图所示。另外，card目录除了实现标准的MMC/SD存储卡以外，该目录还包含一些SDIO外设的卡驱动，如drivers/mmc/card/sdio_uart.c。core目录除了给card提供接口外，实际上也定义好了host驱动的框架。

2 结构体

2.1 block_device_operations

struct block_device_operations 定义了一组与块设备操作相关的函数指针。这些函数实现了设备的打开、关闭、读写、I/O控制等功能，使得不同的块设备能够通过统一的接口与内核进行交互。这个结构体是块设备驱动程序的核心部分，为块设备的管理提供了灵活性和扩展性。

1983 struct block_device_operations {
1984         int (*open) (struct block_device *, fmode_t);
1985         void (*release) (struct gendisk *, fmode_t);
1986         int (*rw_page)(struct block_device *, sector_t, struct page *, unsigned int);
1987         int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
1988         int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
1989         unsigned int (*check_events) (struct gendisk *disk,
1990                                       unsigned int clearing);
1991         /* ->media_changed() is DEPRECATED, use ->check_events() instead */
1992         int (*media_changed) (struct gendisk *);
1993         void (*unlock_native_capacity) (struct gendisk *);
1994         int (*revalidate_disk) (struct gendisk *);
1995         int (*getgeo)(struct block_device *, struct hd_geometry *);
1996         /* this callback is with swap_lock and sometimes page table lock held */
1997         void (*swap_slot_free_notify) (struct block_device *, unsigned long);
1998         struct module *owner;
1999         const struct pr_ops *pr_ops;
2000 };

int (*open) (struct block_device *, fmode_t);

用于打开块设备的回调函数。它接收一个指向块设备的指针和打开模式，返回一个整数，通常表示成功与否。

void (*release) (struct gendisk *, fmode_t);

用于释放块设备的回调函数。当设备不再使用时调用，清理相关资源。

int (*rw_page)(struct block_device *, sector_t, struct page *, unsigned int);

用于读写页面的回调函数，支持页面级别的I/O操作。

int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);

用于处理设备特定的控制请求。允许用户空间程序与设备进行交互。

int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);

处理兼容性I/O控制请求，通常用于旧版本的系统调用。

unsigned int (*check_events) (struct gendisk *disk, unsigned int clearing);

用于检查设备事件，比如介质是否发生变化，取代了 media_changed。

int (*media_changed) (struct gendisk *);

过时的函数，用于检测介质是否改变，现已被 check_events 取代。

void (*unlock_native_capacity) (struct gendisk *);

解锁设备的原生容量，允许设备在容量发生变化时更新。

int (*revalidate_disk) (struct gendisk *);

重新验证块设备的状态，通常在设备状态可能发生变化时调用。

int (*getgeo)(struct block_device *, struct hd_geometry *);

用于获取设备几何信息，例如扇区数、每个扇区的大小等。

void (*swap_slot_free_notify) (struct block_device *, unsigned long);

当交换槽被释放时的通知函数，通常用于处理内存管理。

struct module *owner;

指向拥有该设备操作的模块的指针，用于管理模块引用计数。

const struct pr_ops *pr_ops;

指向打印操作的结构体，通常用于设备的日志和调试输出。

2.2 gendisk

在Linux内核中，使用gendisk（通用磁盘）结构体来表示一个独立的磁盘设备（或分区）

183 struct gendisk {
184         /* major, first_minor and minors are input parameters only,
185          * don't use directly.  Use disk_devt() and disk_max_parts().
186          */
187         int major;                      /* major number of driver */
188         int first_minor;
189         int minors;                     /* maximum number of minors, =1 for
190                                          * disks that can't be partitioned. */
191 
192         char disk_name[DISK_NAME_LEN];  /* name of major driver */
193         char *(*devnode)(struct gendisk *gd, umode_t *mode);
194 
195         unsigned int events;            /* supported events */
196         unsigned int async_events;      /* async events, subset of all */
197 
198         /* Array of pointers to partitions indexed by partno.
199          * Protected with matching bdev lock but stat and other
200          * non-critical accesses use RCU.  Always access through
201          * helpers.
202          */ 
203         struct disk_part_tbl __rcu *part_tbl;
204         struct hd_struct part0;
205 
206         const struct block_device_operations *fops;
207         struct request_queue *queue;
208         void *private_data;
209 
210         int flags;
211         struct rw_semaphore lookup_sem;
212         struct kobject *slave_dir;
213 
214         struct timer_rand_state *random;
215         atomic_t sync_io;               /* RAID */
216         struct disk_events *ev;
217 #ifdef  CONFIG_BLK_DEV_INTEGRITY
218         struct kobject integrity_kobj;
219 #endif  /* CONFIG_BLK_DEV_INTEGRITY */
220         int node_id;
221         struct badblocks *bb;
222         struct lockdep_map lockdep_map;
223 
224         ANDROID_KABI_RESERVE(1);
225         ANDROID_KABI_RESERVE(2);
226         ANDROID_KABI_RESERVE(3);
227         ANDROID_KABI_RESERVE(4);
228 
229 };

major: 驱动程序的主设备号。

first_minor: 起始次设备号。

minors: 最大次设备数量。如果设备不可分区，通常为 1。

disk_name: 磁盘的名称（如 "sda"）。

devnode: 指向生成设备节点名称的函数指针。

events: 设备支持的事件位掩码。

async_events: 异步事件的子集。

part_tbl: 指向分区表的指针，受 bdev 锁保护。

part0: 设备的第一个分区。

fops: 指向设备操作函数的结构体（如读写操作）。

queue: 请求队列，用于处理 I/O 请求。

private_data: 设备私有数据指针。

flags: 标志位，用于设备状态。

lookup_sem: 读写信号量，用于同步。

slave_dir: 指向与设备相关的 kobject。

random: 随机数状态（与设备随机性相关）。

sync_io: 原子计数器，用于 RAID 操作。

ev: 指向设备事件结构体的指针。

integrity_kobj: （可选）用于设备完整性的 kobject。

node_id: 设备节点 ID。

bb: 指向坏块列表的指针。

lockdep_map: 用于锁依赖检查的结构体。

2.3 block_device

内核使用 block_device 来表示一个具体的块设备对象，比如一个硬盘或者分区

454 struct block_device {
 455         dev_t                   bd_dev;  /* not a kdev_t - it's a search key */
 456         int                     bd_openers;
 457         struct inode *          bd_inode;       /* will die */
 458         struct super_block *    bd_super;
 459         struct mutex            bd_mutex;       /* open/close mutex */
 460         void *                  bd_claiming;
 461         void *                  bd_holder;
 462         int                     bd_holders;
 463         bool                    bd_write_holder;
 464 #ifdef CONFIG_SYSFS
 465         struct list_head        bd_holder_disks;
 466 #endif  
 467         struct block_device *   bd_contains;
 468         unsigned                bd_block_size;
 469         u8                      bd_partno;
 470         struct hd_struct *      bd_part;
 471         /* number of times partitions within this device have been opened. */
 472         unsigned                bd_part_count;
 473         int                     bd_invalidated;
 474         struct gendisk *        bd_disk;
 475         struct request_queue *  bd_queue;
 476         struct backing_dev_info *bd_bdi;
 477         struct list_head        bd_list;
 478         /* 
 479          * Private data.  You must have bd_claim'ed the block_device
 480          * to use this.  NOTE:  bd_claim allows an owner to claim
 481          * the same device multiple times, the owner must take special
 482          * care to not mess up bd_private for that case.
 483          */
 484         unsigned long           bd_private;
 485 
 486         /* The counter of freeze processes */
 487         int                     bd_fsfreeze_count;
 488         /* Mutex for freeze */
 489         struct mutex            bd_fsfreeze_mutex;
 490 
 491         ANDROID_KABI_RESERVE(1);
 492         ANDROID_KABI_RESERVE(2);
 493         ANDROID_KABI_RESERVE(3);
 494         ANDROID_KABI_RESERVE(4);
 495 } __randomize_layout;

2.4 request_queue

内核将对块设备的读写都发送到请求队列 request_queue 中，request_queue 中是大量的request(请求结构体)，而 request 又包含了 bio，bio 保存了读写相关数据

434 struct request_queue {
 435         /*
 436          * Together with queue_head for cacheline sharing
 437          */
 438         struct list_head        queue_head;
 439         struct request          *last_merge;
 440         struct elevator_queue   *elevator;
 441         int                     nr_rqs[2];      /* # allocated [a]sync rqs */
 442         int                     nr_rqs_elvpriv; /* # allocated rqs w/ elvpriv */
 443 
 444         struct blk_queue_stats  *stats;
 445         struct rq_qos           *rq_qos;
 446        
 447         /*
 448          * If blkcg is not used, @q->root_rl serves all requests.  If blkcg
 449          * is used, root blkg allocates from @q->root_rl and all other
 450          * blkgs from their own blkg->rl.  Which one to use should be
 451          * determined using bio_request_list().
 452          */
 453         struct request_list     root_rl;
 454        
 455         request_fn_proc         *request_fn;
 456         make_request_fn         *make_request_fn;
 457         poll_q_fn               *poll_fn;
 458         prep_rq_fn              *prep_rq_fn;
 459         unprep_rq_fn            *unprep_rq_fn;
 460         softirq_done_fn         *softirq_done_fn;
 461         rq_timed_out_fn         *rq_timed_out_fn;
 462         dma_drain_needed_fn     *dma_drain_needed;
 463         lld_busy_fn             *lld_busy_fn;
 464         /* Called just after a request is allocated */
 465         init_rq_fn              *init_rq_fn;
 466         /* Called just before a request is freed */
 467         exit_rq_fn              *exit_rq_fn;
 468         /* Called from inside blk_get_request() */
 469         void (*initialize_rq_fn)(struct request *rq);
 470 
 471         const struct blk_mq_ops *mq_ops;
 472 
 473         unsigned int            *mq_map;
 474 
 475         /* sw queues */
 476         struct blk_mq_ctx __percpu      *queue_ctx;
 477         unsigned int            nr_queues;
 478 
 479         unsigned int            queue_depth;
 480 
 481         /* hw dispatch queues */
 482         struct blk_mq_hw_ctx    **queue_hw_ctx;
 483         unsigned int            nr_hw_queues;
 484 
 485         /*
 486          * Dispatch queue sorting
 487          */
 488         sector_t                end_sector;
 489         struct request          *boundary_rq;
 490 
 491         /*
 492          * Delayed queue handling
 493          */
 494         struct delayed_work     delay_work;
 495 
 496         struct backing_dev_info *backing_dev_info;
 497 
 498         /*
 499          * The queue owner gets to use this for whatever they like.
 500          * ll_rw_blk doesn't touch it.
 501          */
 502         void                    *queuedata;
 503 
 504         /*
 505          * various queue flags, see QUEUE_* below
 506          */
 507         unsigned long           queue_flags;
 508         /*
 509          * Number of contexts that have called blk_set_pm_only(). If this
 510          * counter is above zero then only RQF_PM and RQF_PREEMPT requests are
 511          * processed.
 512          */
 513         atomic_t                pm_only;
 514 
 515         /*
 516          * ida allocated id for this queue.  Used to index queues from
 517          * ioctx.
 518          */
 519         int                     id;
 520 
 521         /*
 522          * queue needs bounce pages for pages above this limit
 523          */
 524         gfp_t                   bounce_gfp;
 525 
 526         /*
 527          * protects queue structures from reentrancy. ->__queue_lock should
 528          * _never_ be used directly, it is queue private. always use
 529          * ->queue_lock.
 530          */
 531         spinlock_t              __queue_lock;
 532         spinlock_t              *queue_lock;
 533 
 534         /*
 535          * queue kobject
 536          */
 537         struct kobject kobj;
 538 
 539         /*
 540          * mq queue kobject
 541          */
 542         struct kobject mq_kobj;
543 
 544 #ifdef  CONFIG_BLK_DEV_INTEGRITY
 545         struct blk_integrity integrity;
 546 #endif  /* CONFIG_BLK_DEV_INTEGRITY */
 547 
 548 #ifdef CONFIG_PM
 549         struct device           *dev;
 550         int                     rpm_status;
 551         unsigned int            nr_pending;
 552 #endif
 553 
 554         /*
 555          * queue settings
 556          */
 557         unsigned long           nr_requests;    /* Max # of requests */
 558         unsigned int            nr_congestion_on;
 559         unsigned int            nr_congestion_off;
 560         unsigned int            nr_batching;
 561 
 562         unsigned int            dma_drain_size;
 563         void                    *dma_drain_buffer;
 564         unsigned int            dma_pad_mask;
 565         unsigned int            dma_alignment;
 566 
 567         struct blk_queue_tag    *queue_tags;
 568 
 569         unsigned int            nr_sorted;
 570         unsigned int            in_flight[2];
 571 
 572         /*
 573          * Number of active block driver functions for which blk_drain_queue()
 574          * must wait. Must be incremented around functions that unlock the
 575          * queue_lock internally, e.g. scsi_request_fn().
 576          */
 577         unsigned int            request_fn_active;
 578 #ifdef CONFIG_BLK_INLINE_ENCRYPTION
 579         /* Inline crypto capabilities */
 580         struct keyslot_manager *ksm;
 581 #endif
 582 
 583         unsigned int            rq_timeout;
 584         int                     poll_nsec;
 585 
 586         struct blk_stat_callback        *poll_cb;
 587         struct blk_rq_stat      poll_stat[BLK_MQ_POLL_STATS_BKTS];
 588 
 589         struct timer_list       timeout;
 590         struct work_struct      timeout_work;
 591         struct list_head        timeout_list;
 592 
 593         struct list_head        icq_list;
 594 #ifdef CONFIG_BLK_CGROUP
 595         DECLARE_BITMAP          (blkcg_pols, BLKCG_MAX_POLS);
 596         struct blkcg_gq         *root_blkg;
597         struct list_head        blkg_list;
 598 #endif
 599 
 600         struct queue_limits     limits;
 601 
 602 #ifdef CONFIG_BLK_DEV_ZONED
 603         /*
 604          * Zoned block device information for request dispatch control.
 605          * nr_zones is the total number of zones of the device. This is always
 606          * 0 for regular block devices. seq_zones_bitmap is a bitmap of nr_zones
 607          * bits which indicates if a zone is conventional (bit clear) or
 608          * sequential (bit set). seq_zones_wlock is a bitmap of nr_zones
 609          * bits which indicates if a zone is write locked, that is, if a write
 610          * request targeting the zone was dispatched. All three fields are
 611          * initialized by the low level device driver (e.g. scsi/sd.c).
 612          * Stacking drivers (device mappers) may or may not initialize
 613          * these fields.
 614          *
 615          * Reads of this information must be protected with blk_queue_enter() /
 616          * blk_queue_exit(). Modifying this information is only allowed while
 617          * no requests are being processed. See also blk_mq_freeze_queue() and
 618          * blk_mq_unfreeze_queue().
 619          */
 620         unsigned int            nr_zones;
 621         unsigned long           *seq_zones_bitmap;
 622         unsigned long           *seq_zones_wlock;
 623 #endif /* CONFIG_BLK_DEV_ZONED */
 624 
 625         /*
 626          * sg stuff
 627          */
 628         unsigned int            sg_timeout;
 629         unsigned int            sg_reserved_size;
 630         int                     node;
 631 #ifdef CONFIG_BLK_DEV_IO_TRACE
 632         struct blk_trace __rcu  *blk_trace;
 633         struct mutex            blk_trace_mutex;
 634 #endif
 635         /*
 636          * for flush operations
 637          */
 638         struct blk_flush_queue  *fq;
 639 
 640         struct list_head        requeue_list;
 641         spinlock_t              requeue_lock;
 642         struct delayed_work     requeue_work;
 643 
 644         struct mutex            sysfs_lock;
 645 
 646         int                     bypass_depth;
 647         atomic_t                mq_freeze_depth;
 648 
 649         bsg_job_fn              *bsg_job_fn;
 650         struct bsg_class_device bsg_dev;
651 
 652 #ifdef CONFIG_BLK_DEV_THROTTLING
 653         /* Throttle data */
 654         struct throtl_data *td;
 655 #endif
 656         struct rcu_head         rcu_head;
 657         wait_queue_head_t       mq_freeze_wq;
 658         struct percpu_ref       q_usage_counter;
 659         struct list_head        all_q_node;
 660 
 661         struct blk_mq_tag_set   *tag_set;
 662         struct list_head        tag_set_list;
 663         struct bio_set          bio_split;
 664 
 665 #ifdef CONFIG_BLK_DEBUG_FS
 666         struct dentry           *debugfs_dir;
 667         struct dentry           *sched_debugfs_dir;
 668 #endif
 669 
 670         bool                    mq_sysfs_init_done;
 671 
 672         size_t                  cmd_size;
 673         void                    *rq_alloc_data;
 674 
 675         struct work_struct      release_work;
 676 
 677 #define BLK_MAX_WRITE_HINTS     5
 678         u64                     write_hints[BLK_MAX_WRITE_HINTS];
 679 };

2.5 request

151 struct request {
 152         struct request_queue *q;
 153         struct blk_mq_ctx *mq_ctx;
 154 
 155         int cpu;
 156         unsigned int cmd_flags;         /* op and common flags */
 157         req_flags_t rq_flags;
 158 
 159         int internal_tag;
 160 
 161         /* the following two fields are internal, NEVER access directly */
 162         unsigned int __data_len;        /* total data len */
 163         int tag;
 164         sector_t __sector;              /* sector cursor */
 165 
 166         struct bio *bio;
 167         struct bio *biotail;
 168 
 169         struct list_head queuelist;
 170 
 171         /*
 172          * The hash is used inside the scheduler, and killed once the
 173          * request reaches the dispatch list. The ipi_list is only used
 174          * to queue the request for softirq completion, which is long
 175          * after the request has been unhashed (and even removed from
 176          * the dispatch list).
 177          */
 178         union {
 179                 struct hlist_node hash; /* merge hash */
 180                 struct list_head ipi_list;
 181         };
 182 
 183         /*
 184          * The rb_node is only used inside the io scheduler, requests
 185          * are pruned when moved to the dispatch queue. So let the
 186          * completion_data share space with the rb_node.
 187          */
 188         union {
 189                 struct rb_node rb_node; /* sort/lookup */
 190                 struct bio_vec special_vec;
 191                 void *completion_data;
 192                 int error_count; /* for legacy drivers, don't use */
 193         };
 194 
 195         /*
 196          * Three pointers are available for the IO schedulers, if they need
 197          * more they have to dynamically allocate it.  Flush requests are
 198          * never put on the IO scheduler. So let the flush fields share
 199          * space with the elevator data.
 200          */
 201         union {
 202                 struct {
 203                         struct io_cq            *icq;
204                         void                    *priv[2];
 205                 } elv;
 206 
 207                 struct {
 208                         unsigned int            seq;
 209                         struct list_head        list;
 210                         rq_end_io_fn            *saved_end_io;
 211                 } flush;
 212         };
 213 
 214         struct gendisk *rq_disk;
 215         struct hd_struct *part;
 216         /* Time that I/O was submitted to the kernel. */
 217         u64 start_time_ns;
 218         /* Time that I/O was submitted to the device. */
 219         u64 io_start_time_ns;
 220 
 221 #ifdef CONFIG_BLK_WBT
 222         unsigned short wbt_flags;
 223 #endif
 224 #ifdef CONFIG_BLK_DEV_THROTTLING_LOW
 225         unsigned short throtl_size;
 226 #endif
 227 
 228         /*
 229          * Number of scatter-gather DMA addr+len pairs after
 230          * physical address coalescing is performed.
 231          */
 232         unsigned short nr_phys_segments;
 233 
 234 #if defined(CONFIG_BLK_DEV_INTEGRITY)
 235         unsigned short nr_integrity_segments;
 236 #endif
 237 
 238         unsigned short write_hint;
 239         unsigned short ioprio;
 240 
 241         void *special;          /* opaque pointer available for LLD use */
 242 
 243         unsigned int extra_len; /* length of alignment and padding */
 244 
 245         enum mq_rq_state state;
 246         refcount_t ref;
 247 
 248         unsigned int timeout;
 249 
 250         /* access through blk_rq_set_deadline, blk_rq_deadline */
 251         unsigned long __deadline;
 252 
 253         struct list_head timeout_list;
 254 
 255         union {
 256                 struct __call_single_data csd;
 257                 u64 fifo_time;
 258         };
 259 
 260         /*
 261          * completion callback.
 262          */
 263         rq_end_io_fn *end_io;
 264         void *end_io_data;
 265 
 266         /* for bidi */
 267         struct request *next_rq;
 268 
 269 #ifdef CONFIG_BLK_CGROUP
 270         struct request_list *rl;                /* rl this rq is alloced from */
 271 #endif
 272 };

2.6 bio

每个 request 里面里面会有多个 bio，bio 保存着最终要读写的数据、地址等信息。上层应用程序对于块设备的读写会被构造成一个或多个 bio 结构，bio 结构描述了要读写的起始扇区、要读写的扇区数量、是读取还是写入、页便宜、数据长度等等信息。上层会将 bio 提交给 I/O 调度器，I/O 调度器会将这些 bio 构造成 request 结构，request_queue 里面顺序存放着一系列的 request。新产生的 bio 可能被合并到 request_queue 里现有的 request 中，也可能产生新的 request，然后插入到 request_queue 中合适的位置，这一切都是由 I/O 调度器来完成的。

request_queue、request 和 bio 之间的关系如图

146 struct bio {
147         struct bio              *bi_next;       /* request queue link */
148         struct gendisk          *bi_disk;
149         unsigned int            bi_opf;         /* bottom bits req flags,
150                                                  * top bits REQ_OP. Use
151                                                  * accessors.
152                                                  */
153         unsigned short          bi_flags;       /* status, etc and bvec pool number */
154         unsigned short          bi_ioprio;
155         unsigned short          bi_write_hint;
156         blk_status_t            bi_status;
157         u8                      bi_partno;
158 
159         /* Number of segments in this BIO after
160          * physical address coalescing is performed.
161          */
162         unsigned int            bi_phys_segments;
163 
164         /*
165          * To keep track of the max segment size, we account for the
166          * sizes of the first and last mergeable segments in this bio.
167          */
168         unsigned int            bi_seg_front_size;
169         unsigned int            bi_seg_back_size;
170 
171         struct bvec_iter        bi_iter;
172 
173         atomic_t                __bi_remaining;
174         bio_end_io_t            *bi_end_io;
175 
176         void                    *bi_private;
177 #ifdef CONFIG_BLK_CGROUP
178         /*
179          * Optional ioc and css associated with this bio.  Put on bio
180          * release.  Read comment on top of bio_associate_current().
181          */
182         struct io_context       *bi_ioc;
183         struct cgroup_subsys_state *bi_css;
184         struct blkcg_gq         *bi_blkg;
185         struct bio_issue        bi_issue;
186 #endif
187 
188 #ifdef CONFIG_BLK_INLINE_ENCRYPTION
189         struct bio_crypt_ctx    *bi_crypt_context;
190 #if IS_ENABLED(CONFIG_DM_DEFAULT_KEY)
191         bool                    bi_skip_dm_default_key;
192 #endif
193 #endif
194 
195         union {
196 #if defined(CONFIG_BLK_DEV_INTEGRITY)
197                 struct bio_integrity_payload *bi_integrity; /* data integrity */
198 #endif
199         };
200 
201         unsigned short          bi_vcnt;        /* how many bio_vec's */
202 
203         /*
204          * Everything starting with bi_max_vecs will be preserved by bio_reset()
205          */
206 
207         unsigned short          bi_max_vecs;    /* max bvl_vecs we can hold */
208 
209         atomic_t                __bi_cnt;       /* pin count */
210 
211         struct bio_vec          *bi_io_vec;     /* the actual vec list */
212 
213         struct bio_set          *bi_pool;
214 
215         ktime_t bi_alloc_ts;                    /* for mm_event */
216 
217         ANDROID_KABI_RESERVE(1);
218         ANDROID_KABI_RESERVE(2);
219 
220         /*
221          * We can inline a number of vecs at the end of the bio, to avoid
222          * double allocations for a small number of bio_vecs. This member
223          * MUST obviously be kept at the very end of the bio.
224          */
225         struc
226 };

bvec_iter 结构体描述了要操作的设备扇区等信息

36 struct bvec_iter {
 37         sector_t                bi_sector;      /* device address in 512 byte
 38                                                    sectors */
 39         unsigned int            bi_size;        /* residual I/O count */
 40 
 41         unsigned int            bi_idx;         /* current index into bvl_vec */
 42 
 43         unsigned int            bi_done;        /* number of bytes completed */
 44 
 45         unsigned int            bi_bvec_done;   /* number of bytes completed in
 46                                                    current bvec */
 47 };

bio_vec结构体用来描述与这个bio请求对应的所有的内存，它可能不总是在一个页面里面，因此需要一个向量，向量中的每个元素实际是一个[page，offset，len]，我们一般也称它为一个片段

30 struct bio_vec { 
 31         struct page     *bv_page;
 32         unsigned int    bv_len;
 33         unsigned int    bv_offset;
 34 };

bio、bvec_iter 以及 bio_vec 这三个结构体之间的关系如图