sqlite mmap

news2025/3/20 21:37:24

https://www.sqlite.org/mmap.html


1. 内存映射 I/O 的基本原理

  • 默认机制(传统 I/O)
    SQLite 默认通过 xRead()xWrite() 方法(对应 read()/write() 系统调用)访问数据库文件。这些方法需要将数据从内核缓冲区复制到用户空间,反之亦然。

  • 内存映射 I/O
    从 SQLite 3.7.17 开始,新增了 xFetch()xUnfetch() 方法。内存映射允许 SQLite 直接将数据库文件的部分或全部内容映射到进程的虚拟内存空间中,通过指针直接访问数据,避免复制操作。


2. 内存映射 I/O 的优缺点

优点
  1. 性能提升

    • 减少数据在内核缓冲区用户空间之间的复制次数,特别适合频繁随机读取的场景(如大型数据库查询)。
    • 对 I/O 密集型操作(如复杂查询)有显著优化。
  2. 减少内存占用

    • SQLite 共享操作系统的页缓存(Page Cache),无需单独维护数据副本。
缺点
  1. I/O 错误处理问题

    • 内存映射文件的 I/O 错误(如磁盘故障)会触发信号(Signal)。若应用程序未捕获这些信号,可能导致程序崩溃。
  2. 操作系统依赖

    • 要求操作系统支持统一缓冲区缓存(Unified Buffer Cache),否则可能引发数据损坏(尤其在多进程混合使用内存映射与传统 I/O 时)。
    • 某些操作系统的统一缓冲区缓存实现存在 Bug。
  3. 性能不确定性

    • 并非所有场景都能提升性能。极端情况下(如小文件频繁访问),内存映射可能比传统 I/O 更慢。
  4. Windows 的局限性

    • Windows 无法截断(Truncate)内存映射文件。当执行 VACUUMauto_vacuum 缩减数据库时,文件末尾的未使用空间不会被释放,但后续操作可复用该空间。
    • 旧版 SQLite(< 3.7.0)可能误报此类文件为损坏。

3. 内存映射 I/O 的工作流程

读取数据
  1. 传统方式(xRead()

    • SQLite 分配堆内存 → 调用 xRead() → 从内核缓冲区复制数据到用户空间。
  2. 内存映射方式(xFetch()

    • SQLite 调用 xFetch() 获取指向内存页的指针。
    • 若成功,直接通过指针访问数据,无需复制。
    • 若失败(返回 NULL),回退到 xRead()
写入数据
  • 无论是否使用内存映射,SQLite 始终在修改数据前将页面复制到堆内存:
    1. 确保其他进程看不到未提交的更改(事务隔离)。
    2. 防止应用程序误操作指针导致数据库损坏。
  • 修改完成后,通过 xWrite() 将数据写回磁盘。

结论:内存映射主要优化读取性能,对写入性能影响有限。


4. 配置内存映射 I/O

核心参数:PRAGMA mmap_size
  • 设置内存映射大小
    PRAGMA mmap_size = 268435456; -- 启用内存映射(例如 256MB)
    PRAGMA mmap_size = 0;         -- 禁用内存映射
    
    • 仅映射文件的前 N 字节,超出部分仍用传统 xRead()
    • 若文件小于 N,映射整个文件。
默认值与上限
  1. 默认值

    • 默认 mmap_size = 0(禁用)。可通过以下方式修改:
      • 编译时SQLITE_DEFAULT_MMAP_SIZE 宏。
      • 启动时sqlite3_config(SQLITE_CONFIG_MMAP_SIZE, X, Y)
  2. 硬性上限

    • 编译时上限SQLITE_MAX_MMAP_SIZE 宏(默认 1GB)。若设为 0,禁用内存映射。
    • 运行时上限:可通过 sqlite3_config(SQLITE_CONFIG_MMAP_SIZE, X, Y) 降低或归零,但不能超过编译时上限。
平台限制
  • OpenBSD 等系统:因缺乏统一缓冲区缓存,强制设置硬性上限为 0(禁用内存映射)。

5. 使用建议

  • 适用场景

    • 读多写少、大型数据库、高并发查询。
    • 确保操作系统支持且稳定(如 Linux、macOS)。
  • 注意事项

    • 测试性能提升效果,避免盲目启用。
    • 处理 Windows 的未释放空间问题(需 SQLite ≥3.7.0)。
    • 监控内存占用,防止虚拟内存耗尽。

总结

SQLite 的内存映射 I/O 通过直接操作虚拟内存提升读取性能,但需权衡兼容性、内存占用和平台限制。合理配置 mmap_size 并结合业务场景测试是关键。写入操作仍依赖传统 I/O 机制,因此内存映射主要优化查询性能。

The default mechanism by which SQLite accesses and updates database disk files is the xRead() and xWrite() methods of the sqlite3_io_methods VFS object. These methods are typically implemented as “read()” and “write()” system calls which cause the operating system to copy disk content between the kernel buffer cache and user space.

Beginning with version 3.7.17 (2013-05-20), SQLite has the option of accessing disk content directly using memory-mapped I/O and the new xFetch() and xUnfetch() methods on sqlite3_io_methods.

There are advantages and disadvantages to using memory-mapped I/O. Advantages include:

Many operations, especially I/O intensive operations, can be faster since content need not be copied between kernel space and user space.

The SQLite library may need less RAM since it shares pages with the operating-system page cache and does not always need its own copy of working pages.

But there are also disadvantages:

An I/O error on a memory-mapped file cannot be caught and dealt with by SQLite. Instead, the I/O error causes a signal which, if not caught by the application, results in a program crash.

The operating system must have a unified buffer cache in order for the memory-mapped I/O extension to work correctly, especially in situations where two processes are accessing the same database file and one process is using memory-mapped I/O while the other is not. Not all operating systems have a unified buffer cache. In some operating systems that claim to have a unified buffer cache, the implementation is buggy and can lead to corrupt databases.

Performance does not always increase with memory-mapped I/O. In fact, it is possible to construct test cases where performance is reduced by the use of memory-mapped I/O.

Windows is unable to truncate a memory-mapped file. Hence, on Windows, if an operation such as VACUUM or auto_vacuum tries to reduce the size of a memory-mapped database file, the size reduction attempt will silently fail, leaving unused space at the end of the database file. No data is lost due to this problem, and the unused space will be reused again the next time the database grows. However if a version of SQLite prior to 3.7.0 runs PRAGMA integrity_check on such a database, it will (incorrectly) report database corruption due to the unused space at the end. Or if a version of SQLite prior to 3.7.0 writes to the database while it still has unused space at the end, it may make that unused space inaccessible and unavailable for reuse until after the next VACUUM.

Because of the potential disadvantages, memory-mapped I/O is disabled by default. To activate memory-mapped I/O, use the mmap_size pragma and set the mmap_size to some large number, usually 256MB or larger, depending on how much address space your application can spare. The rest is automatic. The PRAGMA mmap_size statement will be a silent no-op on systems that do not support memory-mapped I/O.

How Memory-Mapped I/O Works

To read a page of database content using the legacy xRead() method, SQLite first allocates a page-size chunk of heap memory then invokes the xRead() method which causes the database page content to be copied into the newly allocated heap memory. This involves (at a minimum) a copy of the entire page.

But if SQLite wants to access a page of the database file and memory mapped I/O is enabled, it first calls the xFetch() method. The xFetch() method asks the operating system to return a pointer to the requested page, if possible. If the requested page has been or can be mapped into the application address space, then xFetch returns a pointer to that page for SQLite to use without having to copy anything. Skipping the copy step is what makes memory mapped I/O faster.

SQLite does not assume that the xFetch() method will work. If a call to xFetch() returns a NULL pointer (indicating that the requested page is not currently mapped into the applications address space) then SQLite silently falls back to using xRead(). An error is only reported if xRead() also fails.

When updating the database file, SQLite always makes a copy of the page content into heap memory before modifying the page. This is necessary for two reasons. First, changes to the database are not supposed to be visible to other processes until after the transaction commits and so the changes must occur in private memory. Second, SQLite uses a read-only memory map to prevent stray pointers in the application from overwriting and corrupting the database file.

After all needed changes are completed, xWrite() is used to move the content back into the database file. Hence the use of memory mapped I/O does not significantly change the performance of database changes. Memory mapped I/O is mostly a benefit for queries.

Configuring Memory-Mapped I/O

The “mmap_size” is the maximum number of bytes of the database file that SQLite will try to map into the process address space at one time. The mmap_size applies separately to each database file, so the total amount of process address space that could potentially be used is the mmap_size times the number of open database files.

To activate memory-mapped I/O, an application can set the mmap_size to some large value. For example:

PRAGMA mmap_size=268435456;

To disable memory-mapped I/O, simply set the mmap_size to zero:

PRAGMA mmap_size=0;

If mmap_size is set to N then all current implementations map the first N bytes of the database file and use legacy xRead() calls for any content beyond N bytes. If the database file is smaller than N bytes, then the entire file is mapped. In the future, new OS interfaces could, in theory, map regions of the file other than the first N bytes, but no such implementation currently exists.

The mmap_size is set separately for each database file using the “PRAGMA mmap_size” statement. The usual default mmap_size is zero, meaning that memory mapped I/O is disabled by default. However, the default mmap_size can be increased either at compile-time using the SQLITE_DEFAULT_MMAP_SIZE macro or at start-time using the sqlite3_config(SQLITE_CONFIG_MMAP_SIZE,…) interface.

SQLite also maintains a hard upper bound on the mmap_size. Attempts to increase the mmap_size above this hard upper bound (using PRAGMA mmap_size) will automatically cap the mmap_size at the hard upper bound. If the hard upper bound is zero, then memory mapped I/O is impossible. The hard upper bound can be set at compile-time using the SQLITE_MAX_MMAP_SIZE macro. If SQLITE_MAX_MMAP_SIZE is set to zero, then the code used to implement memory mapped I/O is omitted from the build. The hard upper bound is automatically set to zero on certain platforms (ex: OpenBSD) where memory mapped I/O does not work due to the lack of a unified buffer cache.

If the hard upper bound on mmap_size is non-zero at compilation time, it may still be reduced or zeroed at start-time using the sqlite3_config(SQLITE_CONFIG_MMAP_SIZE,X,Y) interface. The X and Y parameters must both be 64-bit signed integers. The X parameter is the default mmap_size of the process and the Y is the new hard upper bound. The hard upper bound cannot be increased above its compile-time setting using SQLITE_CONFIG_MMAP_SIZE but it can be reduced or zeroed.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2318593.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

Java 大视界 -- 企业数字化转型中的 Java 大数据战略与实践(93)

&#x1f496;亲爱的朋友们&#xff0c;热烈欢迎来到 青云交的博客&#xff01;能与诸位在此相逢&#xff0c;我倍感荣幸。在这飞速更迭的时代&#xff0c;我们都渴望一方心灵净土&#xff0c;而 我的博客 正是这样温暖的所在。这里为你呈上趣味与实用兼具的知识&#xff0c;也…

linux:环境变量,进程地址空间

一.命令行参数 main的参数&#xff1a;int argc,char*argv[]&#xff0c;char*env[] 1.参数意义&#xff1a; argc是命令行调用次程序时传递的参数 例&#xff1a; ls -l -a 传递了三个参数&#xff0c;“ls" "-l" "-a"三个字符串 argv是传递的参…

mybatis集合映射association与collection

官方文档&#xff1a;MyBatis的一对多关联关系 一、用途 一对一&#xff1a;association 一对多&#xff1a;collection 二、association 比较容易理解&#xff0c;可参考官方文档 三、collection <?xml version"1.0" encoding"UTF-8"?> &l…

【AIGC】Win10系统极速部署Docker+Ragflow+Dify

【AIGC】WIN10仅3步部署DockerRagflowDify 一、 Docker快速部署1.F2进入bios界面&#xff0c;按F7设置开启VMX虚拟化技术。保存并退出。2.打开控制面板配置开启服务3.到官网下载docker安装包&#xff0c;一键安装&#xff08;全部默认勾选&#xff09; 二、 RagFlow快速部署1.确…

全局上下文网络GCNet:创新架构提升视觉识别性能

摘要&#xff1a;本文介绍了全局上下文网络&#xff08;GCNet&#xff09;&#xff0c;通过深入分析非局部网络&#xff08;NLNet&#xff09;&#xff0c;发现其在重要视觉识别任务中学习的全局上下文与查询位置无关。基于此&#xff0c;提出简化的非局部模块、全局上下文建模…

鸿蒙NEXT项目实战-百得知识库03

代码仓地址&#xff0c;大家记得点个star IbestKnowTeach: 百得知识库基于鸿蒙NEXT稳定版实现的一款企业级开发项目案例。 本案例涉及到多个鸿蒙相关技术知识点&#xff1a; 1、布局 2、配置文件 3、组件的封装和使用 4、路由的使用 5、请求响应拦截器的封装 6、位置服务 7、三…

Linux上位机开发实战(qt编译之谜)

【 声明&#xff1a;版权所有&#xff0c;欢迎转载&#xff0c;请勿用于商业用途。 联系信箱&#xff1a;feixiaoxing 163.com】 很多同学都喜欢用IDE&#xff0c;也能理解。因为不管是visual studio qt插件&#xff0c;还是qt creator其实都帮我们做了很多额外的工作。这里面最…

【人工智能】【Python】在Scikit-Learn中使用网格搜索对决策树调参

这次实践课最大收获非网格搜索莫属。 # 导入包 import matplotlib.pyplot as plt import numpy as np from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split, GridSearchCV # 网格搜索 from sklearn.tree import DecisionTreeClassi…

用Python代码生成批量下单json

需求 根据以下json体&#xff0c;生成230OrderList对象生成10位有序的数字字母随机数赋值给OrderDetailList.ApiOrderId 和 OrderDetailList.Traceid生成的Json文件 保存在项目JSON目录中 {"UAccount": "xxxx","Password": "","…

TCP、UDP协议的应用、ServerSocket和Socket、DatagramSocket和DatagramPacket

DAY13.1 Java核心基础 TCP协议 TCP 协议是面向连接的运算层协议&#xff0c;比较复杂&#xff0c;应用程序在使用TCP协议之前必须建立连接&#xff0c;才能传输数据&#xff0c;数据传输完毕之后需要释放连接 就好比现实生活中的打电话&#xff0c;首先确保电话打通了才能进…

配置VMware Workstation中Ubuntu虚拟机与Windows主机的剪贴板共享功能

步骤1&#xff1a;安装或更新VMware Tools组件‌ ‌卸载旧版本工具&#xff08;可选&#xff09;‌ 若已安装旧版工具&#xff0c;建议先卸载&#xff1a; sudo apt-get autoremove open-vm-tools‌安装必需组件‌ sudo apt-get updatesudo apt-get install open-vm-tools o…

深入理解Python闭包与递归:原理、应用与实践

目录 闭包 什么是闭包&#xff1a; 闭包的基本结构&#xff1a; 实现闭包的条件&#xff1a; 1.嵌套函数 2.内函数引用外部函数的变量 3.外部函数返回内部函数 4.外部函数已经执行完毕 递归函数 什么是递归函数&#xff1a; 递归函数条件 1.必须有个明确的结束条…

SeaCMS代码审计

漏洞描述 漏洞分析 根据漏洞描述定位漏洞代码 当actionsaveCus或者save时&#xff0c;可以进行一个文件写入&#xff0c;不过文件类型被进行了限制&#xff0c;只有html,htm,js,txt,css 虽然这里并不能写入php文件&#xff0c;但是当actionadd或者custom时&#xff0c;这里进行…

好看的网络安全登录页面 vue http网络安全

一、http协议 http协议是一种网络传输协议&#xff0c;规定了浏览器和服务器之间的通信方式。位于网络模型中的应用层。&#xff08;盗图小灰。ヾ(◍∇◍)&#xff89;&#xff9e;&#xff09; 但是&#xff0c;它的信息传输全部是以明文方式&#xff0c;不够安全&#xff0c;…

Unity--GPT-SoVITS接入、处理GPTAPI的SSE响应流

GPT-SoVITS GPT-SoVITS- v2&#xff08;v3也可以&#xff0c;两者对模型文件具有兼容&#xff09; 点击后 会进入新的游览器网页 ----- 看了一圈&#xff0c;发现主要问题集中在模型的训练很需要CPU&#xff0c;也就是模型的制作上&#xff0c;问题很多&#xff0c;如果有现有…

Redis哈希槽机制的实现

Redis哈希槽机制的实现 Redis集群使用哈希槽&#xff08;Hash Slot&#xff09;来管理数据分布&#xff0c;整个集群被划分为固定的16384个哈希槽。当我们在集群中存储一个键时&#xff0c;Redis会先对键进行哈希运算&#xff0c;得到一个哈希值。然后&#xff0c;Redis将该哈…

docker pull 提示timeout

通过命令行拉取对应的mysql版本提示网络超时。 开始排查&#xff0c;首先确认是否能浏览器访问。ok的&#xff0c;可以正常访问。 终端curl 排查嗯 有问题 改了下 终端 vim ~/.zshrc 加入 export HTTP_PROXY"http://127.0.0.1:7890" export HTTPS_PROXY"…

(超详细) ETL工具之Kettle

Kettle简介 kettle最早是一个开源的ETL工具&#xff0c;后命名为Pentaho Data Integration。由JAVA开发&#xff0c;支持跨平台运行&#xff0c;其特性包括&#xff1a;支持100%无编码、拖拽方式开发ETL数据管道&#xff0c;可对接包括传统数据库、文件、大数据平台、接口、流…

random_masking 函数测试

文章目录 1. description2. excel3. pytorch code 1. description 功能&#xff1a;按一定比例的随机部分样本&#xff0c;简单来说就是按照一定的比例将行向量从小到大的顺序提取出来。思考1&#xff1a; 用了均匀分布&#xff0c;并且按照一定比例&#xff0c;取前prob概率来…

TDengine 中的流式计算

简介 TDengine 中的流计算&#xff0c;功能相当于简化版的 FLINK &#xff0c; 具有实时计算&#xff0c;计算结果可以输出到超级表中存储&#xff0c;同时也可用于窗口预计算&#xff0c;加快查询速度。 创建流式计算 CREATE STREAM [IF NOT EXISTS] stream_name [stream_o…