ddia(3)----Chapter3. Storage and Retrieval

news2024/11/25 2:53:28

However, first we’ll start this chapter by talking about storage engines that are used in the kinds of databases that you’re probably familiar with: traditional relational databases, and also most so-called NoSQL databases. We will examine two families of storage engines: log-structured storage engines, and page-oriented storage engines such as B-trees.

Data Structures That Power Your Database

The simplest database in the world is two Bash functions:

#!/bin/bash
db_set () {
  echo "$1,$2" >> database
}

db_get () {
  grep "^$1," database | sed -e "s/^$1,//" | tail -n 1
}

Every call to db_set appends to the end of a file, when you update a key, the old version is not overwritten–you need to look at the last occurrence of a key in a file to find the latest value. db_set has pretty good performance, because appending to a file is generally very efficient. Many databases internally use a log, just like db_set does. But it still has some shortages:

  • its performance will be terrible if you have a large number of records in your database. Every time you want to look up a key, db_get has to scan the entire database file from beginning to end.
  • more issues to deal with (such as concurrency control, reclaiming disk space so that the log doesn’t grow forever, and handling errors and partially written records),

In order to efficiently find the value for a particular key in the database, we need a different data structure: an index.

Hash Indexes

The simplest possible indexing strategy is: keeping an in-memory hash map where every key is mapped to a byte offset in the data file----the location at which the value can be found. Whenever you append a new key-value pair to the file, you also update the hash map to reflect the offset of the data you just wrote(this works both for inserting new keys and for updating existing keys). When you want to look up a value, use the hash map to find the offset in the data file, seek to that location, and read the value.
only appending to a file
If you want to delete a key and its associated value, you have to append a special deletion record to the data file (sometimes called a tombstone).
We can break the log into segments of a certain size by closing segment file when it reaches a certain size to avoid eventally running out of disk space. Moreover, since compaction often makes segments much smaller (assuming that a key is overwritten several times on average within one segment), we can also merge several segments together at the same time as performing the compaction.
在这里插入图片描述
In order to achieve concurrency control, there is only one writer thread. Data file segments are append-only and otherwise immutable, so they can be read concurrently by multiple threads.

  • Good
    • Appending and segment merging are sequential write operations, which are generally much faster than random writes, especially on magnetic spinning-disk hard drives. To some extent, sequential writes are also preferable on flash-based solid state drives (SSDs).
    • Merging old segments avoids the problem of data files getting fragmented over time.
    • Concurrency and crash recovery are much simpler if segment files are append-only or immutable. For example, you don’t have to worry about the case where a crash happened while a value was being overwritten, leaving you with a file containing part of the old and part of the new value spliced together.
  • Limitations
    • Range queries are not efficient. you’d have to look up each key individually in the hash maps.
    • The hash table must fit in memory, so the number of keys shouldn’t be too large or small.

SSTables and LSM-Trees

  • Sorted String Table(SSTable): the sequence of key-value pairs is sorted by key.
  • advantages:
    • Merging segments is simple and efficient, even if the files are bigger than the available memory. This approach is like mergesort algorithm.
      在这里插入图片描述
    • In order to find a particular value, you no longer need to keep an index of all the keys in memory. If you are looking for the key b, but don’t know the exact offset of that key in the segment file. However, you do know the offsets of the keys a and c, so you can jump to the offset for a and scan from there until you find b.
      在这里插入图片描述
    • Since read requests need to scan over several key-value pairs in the requested range anyway, it is possible to group those records into a block and compress it before writing it to disk.
Constructing and maintaining SSTables
  1. When a write comes in, add it to an in-memory balanced tree data structure (for example, a red-black tree). This in-memory tree is called a memtable.
  2. When the memtable is bigger than a threshold----typically a few megabytes----write it out to disk as an SSTable file. This can be done efficiently because the tree maintains sorted by key.
  3. In order to serve a read request, first try to find the key in the memtable, then in the most recent on-disk segment, then in the older segment.
  4. From time to time, run a merging and compaction process in the background to combine segment files and discard overwritten and deleted values.
  5. We can keep a separate log on disk to which every write is immediately appended, that log is not in sorted order, because its only purpose is to restore the memtable after a crash.
Making an LSM-tree out of SSTables
Performance optimizations
  • If you want to look for a key do not exist in the database: you have to check the memtable, then the segments all the way back to the oldest(possibly having to read from disk for each one) before you can be sure that the key does not exist. In order to optimize, we can use Bloom filters.
  • There are also different strategies to determine the order and timing of how SSTables are compacted and merged:
    • size-tiered compaction: newer and smaller SSTables are successively merged into older and larger SSTables.
    • leveled compaction: the key range is split up into smaller SSTables and older data is moved into separate “levels,” which allows the compaction to proceed more incrementally and use less disk space.

B-Trees

Like SSTables, B-trees keep key-value pairs sorted by key, which allows efficient key-value lookups and range queries. B-trees break the database down into fixed-size blocks or pages, traditionally 4KB in size, and read or write one page at a time.
在这里插入图片描述
If you want to add a new key, you need to find the page whose range encompasses the new key and add it to that page. If there isn’t enough space in the page to accommodate the new key, it is split into two half-full pages, and the parent page is updated to account for the new subdivision of key ranges.
This algorithm ensures the tree remains balanced: a B-tree with n keys always has a depth of O(log n). Most databases can fit into a B-tree that is three or four levels deep, so you don’t need to follow many page references to find the page you are looking for. (A four-level tree of 4 KB pages with a branching factor of 500 can store up to 256 TB.)
在这里插入图片描述

Making B-trees reliable

If you split a page because you want to insert a new key, this is a dangerous operation, because if the database crashes after only some of the pages be written, you end up with a corrupted index.
In order to make the database more reliable, there are two implementations:

  • write-ahead log(WAL, also know as redo log). This is an append-only file to which every B-tree modification must be written before it can be written to the tree itself. When the database comes back up after a crash, this log is used to restore the B-tree back to a consistent state.
  • Concurrency control is required if multiple threads are going to access the B-tree at the same time, latches(lightweight locks) is used to protect the tree’s data structures.
B-tree optimizations
  • Instead of overwriting pages and maintaining a WAL for crash recovery, some databases (like LMDB) use a copy-on-write scheme.
  • We can save space in pages by not storing the entire key, but abbreviating it.
  • Additional pointers have been added to the tree. For example, each leaf page may have references to its sibling pages to the left and right, which allows scanning keys in order without jumping back to parent pages.

Comparing B-Trees and LSM-Trees

Advantages of LSM-trees
  • A B-tree index must write every piece of data at least twice: once to the write-ahead log, and once to the tree page itself (and perhaps again as pages are split).
  • Log-structured indexes also rewrite data multiple times due to repeated compaction and merging of SSTables. This effect—one write to the database resulting in multiple writes to the disk over the course of the database’s lifetime—is known as write amplification. Moreover, LSM-trees are typically able to sustain higher write throughput than B-trees, partly because they sometimes have lower write amplification.
  • LSM-trees can be compressed better, and thus often produce smaller files on disk than B-trees. B-tree storage engines leave some disk space unused due to fragmentation.
Downsides of LSM-trees
  • A downside of log-structured storage is that the compaction process can sometimes interfere with the performance of ongoing reads and writes.
  • Another issue with compaction arises at high write throughput: the disk’s finite write bandwidth needs to be shared between the initial write (logging and flushing a memtable to disk) and the compaction threads running in the background. If write throughput is high and compaction is not configured carefully, in this case, the number of unmerged segments on disk keeps growing until you run out of disk space, and reads also slow down because they need to check more segment files.
  • An advantage of B-trees is that each key exists in exactly one place in the index, whereas a log-structured storage engine may have multiple copies of the same key in different segments.

Other Indexing Structures

So far we have only discussed key-value indexes, which are like a primary key index in the relational model.
It is also common to have secondary index, you can use CREATE INDEX to create a secondary index.
A secondary index can easily be constructed from a key-value index. The main difference is that keys are not unique.

Storing values within the index
Multi-column indexes
Keeping everything in memory

As RAM becomes cheaper, the cost-per-gigabyte argument is eroded. Many datasets are simply not that big, so it’s quite feasible to keep them entirely in memory.
Besides performance, in-memory databases also provide data models that are difficult to implement with disk-based indexes. For example, Redis offers a database-like interface to various data structures such as priority queues and sets. Because it keeps all data in memory, its implementation is comparatively simple.

Transaction Processing or Analytics?

在这里插入图片描述
There was a trend for companies to stop using their OLTP systems for analytics purposes, and to run the analytics on a separate database instead. This separate database was called a data warehouse.

Data Warehousing

A data warehouse, by contrast, is a separate database that analysts can query to their hearts’ content, without affecting OLTP operations. The data warehouse contains a read-only copy of the data in all the various OLTP systems in the company. Data is extracted from OLTP databases (using either a periodic data dump or a continuous stream of updates), transformed into an analysis-friendly schema, cleaned up, and then loaded into the data warehouse. This process of getting data into the warehouse is known as Extract–Transform–Load (ETL).
在这里插入图片描述

The divergence between OLTP databases and data warehouses

Stars and Snowflakes: Schemas for Analytics

  • Star schema: The name “star schema” comes from the fact that when the table relationships are visualized, the fact table is in the middle, surrounded by its dimension tables.
  • A variation of this template is known as the snowflake schema, where dimensions are further broken down into subdimensions.
    In a typical data warehouse, tables are often very wide: fact tables often have over 100
    columns, sometimes several hundred.
    在这里插入图片描述

Column-Oriented Storage

If we want to analyze whether people are more inclined to buy fresh fruit or candy, depending on the day of the week. It only needs access to three columns of the fact_sales table:daate_key, produck_sk, and quantity. The query ignores all other columns.
But in a row-oriented storage engine still needs to load all of those rows from disk to memory, parse them, and filter out those that don’t meet the required conditions.
The idea behind column-oriented storage is simple: don’t store all the values from one row together, but store all the values from each column together instead. If each column is stored in a separate file, a query only needs to read and parse those columns that are used in that query, which can save a lot of work.
在这里插入图片描述

Column Compression

Take a look at the sequences of values for each column, they look quite repetitive, the following are some approaches for compression.
在这里插入图片描述
One technique that is particularly effective in data warehouses is bitmap encoding.
But if the data is bigger, there will be a lot of zeros in most of the bitmaps (we say that they are sparse). In that case, the bitmaps can additionally be run-length encoded.
在这里插入图片描述
Bitmap indexes such as these are well suited for the kinds of queries that are common in a data warehouse.

WHERE product_sk IN (30, 68, 69)

Just need to load the three bitmaps for product_sk = 30, 68, 69 and calculate the bitwise OR of the three bitmaps.

Column-oriented storage and column families
Memory bandwidth and vectorized processing

Sort Order in Column Storage

The data needs to be sorted an entire row at a time, even though it is stored by column.

  • Advantages:
    • help with compression of columns. If the primary sort column does not have many distinct values, then after sorting, it will have long sequences where the same value is repeated many times in a row.
    • That will help queries that need to group or filter sales by product within a certain date range.
Several different sort orders

The row-oriented store keeps every row in one place(in a heap file or a clustered index), and secondary indexes just contain pointers to the matching rows. In a column store, there normally aren’t any pointers to data elsewhere, only columns containing values.

Writing to Column-Oriented Storage

  • If you wanted to insert a row in the middle of a sorted table, you would most likely have to rewrite all the column files.
  • We can use LSM-tree to solve this problem: all writes first go to an in-memory store, where they are added to a sorted structure and prepared for writing to disk. Queries need to examine both the column data on disk and the recent writes in memory and combine the two.

Aggregation: Data Cubes and Materialized Views

Except for column-oriented storage, another aspect of data warehouses is materialized aggregates. As we all know, there are many aggregate functions in SQL, such as COUNT, SUM, MIN, etc. If the same aggregates are used by many different queries, we can cache the counts or sums that queries use most often. This cache is called materialized view.
Two dimensions of a data cube, aggregating data by summing.
The advantage of a materialized data cube is that certain queries become very fast because they have effectively been precomputed.
The disadvantage is that a data cube doesn’t have the same flexibility as querying the raw data.

Summary

On a high level, we saw that storage engines fall into two broad categories: those optimized for transaction processing (OLTP), and those optimized for analytics (OLAP). There are big differences between the access patterns in those use cases:

  • OLTP systems are typically user-facing, which means that they may see a huge volume of requests. In order to handle the load, applications usually only touch a small number of records in each query. The application requests records using some kind of key, and the storage engine uses an index to find the data for the requested key. Disk seek time is often the bottleneck here.
  • OLAP and Data warehouses are less well known, because they are primarily used by business analysts, not by end users. They handle a much lower volume of queries than OLTP systems, but each query is typically very demanding, requiring many millions of records to be scanned in a short time. Disk bandwidth (not seek time) is often the bottleneck here, and column-oriented storage is an increasingly popular solution for this kind of workload.

On the OLTP side, we saw storage engines from two main schools of thought:

  • The log-structured school, which only permits appending to files and deleting obsolete files, but never updates a file that has been written. Bitcask, SSTables, LSM-trees, LevelDB, Cassandra, HBase, Lucene, and others belong to this group.
  • The update-in-place school, which treats the disk as a set of fixed-size pages that can be overwritten. B-trees are the biggest example of this philosophy, being used in all major relational databases and also many nonrelational ones.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/865649.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

【C语言学习】函数原型

函数原型 代码一 #include<stdio.h> void sum(int begin, int end) {int i;int sum 0;for(ibegin; i<end; i){sum sum i;}printf("%d到%d的和是%d\n", begin, end, sum); }int main() {sum(1,10);sum(20,30);sum(40,50);return 0; }代码二&#xff08;函…

Camunda 7.x 系列【12】创建流程引擎

有道无术,术尚可求,有术无道,止于术。 本系列Spring Boot 版本 2.7.9 本系列Camunda 版本 7.19.0 源码地址:https://gitee.com/pearl-organization/camunda-study-demo 文章目录 1. ProcessEngine2. 创建流程引擎2.1 Java API2.2 XML 配置2.3 Spring2.4 Spring Boot1. Pr…

大专非科班转码成功自白

大专非科班转码成功自白 文章目录 大专非科班转码成功自白初步学习进阶学习提供阶段面试阶段总结 2023年是博主从业18年以来找工作最难的一年。但程序员这个行业还是被很多毕业生青睐。就业相对比较好&#xff0c;收入相对比较高&#xff0c;虽然面临996&#xff0c;依然给很多…

外来jar包运行项目,更换部分文件重新压缩成jar包部署运行

跟公司一个外部支援同事合作开发&#xff0c;发包版本在他那里&#xff0c;功能开发工作我来做&#xff0c;可能是因为我是后来加入的&#xff0c;他不想把代码交到公司来&#xff0c;每次要发布新版本急于测试&#xff0c;联系他发包一直不回复消息&#xff0c;打电话也不接&a…

面试易考:多线程模式下的单例模式两种具体实现(饿汉,懒汉),两个的线程安全性,阻塞队列,生产者消费者模型

补充&#xff1a;synchron(锁对象&#xff09;&#xff1a;给对象里面做了一个标记&#xff0c;每个对象&#xff0c;除了代码中写的属性外&#xff0c;此外还有一部分空间&#xff0c;存储的是标志位&#xff0c;这个标志位相当于是加锁&#xff0c;当这一位被标记加锁之后&am…

项目实战 — 消息队列(7){虚拟主机设计(2)}

目录 一、消费消息的规则 二、消费消息的具体实现方法 &#x1f345; 1、编写消费者类&#xff08;ConsumerEnv&#xff09; &#x1f345; 2、编写Consumer函数式接口&#xff08;回调函数&#xff09; &#x1f345; 3、编写ConsumeerManager类 &#x1f384;定义成员变…

Linux系统下安装配置 Nginx 超详细图文教程

Linux系统下安装配置 Nginx 详细教程介绍 一、下载 Nginx 安装包 打开Nginx官网 &#xff1a;http://nginx.org/en/download.html 然后我们找到一个版本&#xff0c;把鼠标移动到上面&#xff0c;右键 - 复制链接地址 我们使用 wget 命令把Nginx安装包下载到/usr/local/目录中…

微信小程序实现当前页面更新上一个页面

日常项目中需要实现的一个价格脱敏功能&#xff1a;通过点击页面二中的查看完整信息 点击回退按钮实现页面一中的价格显露出来 通过查询了大量资料发现 大多数都是通过调用上一个接口的onload 或者onshow 实现视图更新 经测试后 发现 无法实现 只能更改数据 无法更新视图 实现…

shell脚本条件测试语句,if,case

shell脚本条件测试语句&#xff0c;if&#xff0c;case 一.条件测试1.1test命令1.2文件测试1.2.1文件测试常见选项 1.3数值比较1.4字符串比较1.5逻辑测试 二.if语句2.1单分支结构2.3多分支 三.case语句 一.条件测试 1.1test命令 测试特定的表达式是否成立&#xff0c;当条件成…

InVEST模型使用

第一天&#xff1a; 1. 生态系统服务理论联系实践案例讲解 2. InVEST模型的开发历程、不同版本的差异及对数据需求的讲解 3. InVEST所需数据的要求&#xff08;分辨率、格式、投影系统等&#xff09;、获取及标准化预处理讲解 4. InVEST运行常见问题及处理解决方法讲解 5.…

【C++】数据结构与算法:常用查找算法

&#x1f60f;★,:.☆(&#xffe3;▽&#xffe3;)/$:.★ &#x1f60f; 这篇文章主要介绍常用查找算法。 学其所用&#xff0c;用其所学。——梁启超 欢迎来到我的博客&#xff0c;一起学习&#xff0c;共同进步。 喜欢的朋友可以关注一下&#xff0c;下次更新不迷路&#x1…

【果树农药喷洒机器人】Part6:静态PWM变量喷药实验

文章目录 一、引言二、静态PWM变量喷药实验2.1搭建喷药实验平台2.2变量喷药控制实验 一、引言 为综合评估所设计的果树喷药机器人变量喷药效率和质量&#xff0c;验证系统的控制性能和实际作业的可行性&#xff0c;本章开展果树变量喷药实验。首先&#xff0c;通过静态的PWM变…

Mongodb 分页查询数据重复

&#xff08;学习的本质&#xff0c;不在于记住哪些知识&#xff0c;而在于它触发了你的思考——迈克尔桑德尔&#xff09; mongodb排序的限制 相关链接 最多对32个键进行排序排序一致性 其中排序一致性指的是&#xff0c;如果被排序的字段值是重复的&#xff0c;那么在进行…

不同路径数

希望这篇题解对你有用&#xff0c;麻烦动动手指点个赞或关注&#xff0c;感谢您的关注~ 不清楚蓝桥杯考什么的点点下方&#x1f447; 考点秘籍 想背纯享模版的伙伴们点点下方&#x1f447; 蓝桥杯省一你一定不能错过的模板大全(第一期) 蓝桥杯省一你一定不能错过的模板大全…

C指针:程序员的神奇箭头,穿越内存的冒险之旅!

目录 &#x1f575;️‍♂️ 引言&#xff1a;指针&#xff0c;那些指向星星的小箭头&#xff01; 一、&#x1f3af; 探索箭头&#xff1a;指针的基础知识 1.1 指针是什么&#xff1f; 1.2 解引用操作符&#xff1a;* 是关键 1.3 指针的比较和运算 1.4 空指针&#xff1a…

Gopeed-全平台开源高速下载器 支持(HTTP、BitTorrent、Magnet)协议

一、软件介绍 Gopeed&#xff08;全称 Go Speed&#xff09;&#xff0c;是一款由GolangFlutter开发的高速下载器&#xff0c;开源、轻量、原生&#xff0c;支持&#xff08;HTTP、BitTorrent、Magnet 等&#xff09;协议下载&#xff0c;并且支持全平台使用&#xff0c;底层使…

淘宝商品详情接口(商品列表,APP详情接口)返回示例说明,PC端和APP端

淘宝商品详情包括以下信息&#xff1a; 1. 商品标题&#xff1a;商品的名称或标题&#xff0c;用于描述商品的特点和功能。 2. 商品价格&#xff1a;商品的销售价格&#xff0c;包括原价和促销价等。 3. 商品图片&#xff1a;展示商品的照片或图像&#xff0c;以便顾客可以更…

Pinia的使用,只需四步轻松上手

Pinia与vuex相比&#xff0c;少了mutation和命名空间&#xff0c;支持ts。更适配vue3. 基本使用&#xff1a; 1.创建一个store文件夹&#xff0c;引用createPinia()方法并暴露出去&#xff08;图一&#xff09; 2.main.ts下引用createPinia并注册use一下&#xff08;图2&#x…

SpringBoot启动报错:java: 无法访问org.springframework.boot.SpringApplication

报错原因&#xff1a;jdk 1.8版本与SpringBoot 3.1.2版本不匹配 解决方案&#xff1a;将SpringBoot版本降到2系列版本(例如2.5.4)。如下图&#xff1a; 修改版本后切记刷新Meavn依赖 然后重新启动即可成功。如下图&#xff1a;

【Matlab】PSO优化(单隐层)BP神经网络

上一篇博客介绍了BP-GA&#xff1a;BP神经网络遗传算法(BP-GA)函数极值寻优——非线性函数求极值&#xff0c;本篇博客将介绍用PSO&#xff08;粒子群优化算法&#xff09;优化BP神经网络。 1.优化思路 BP神经网络的隐藏节点通常由重复的前向传递和反向传播的方式来决定&#…