目录结构
注:提前言明 本文借鉴了以下博主、书籍或网站的内容,其列表如下:
1、参考书籍:《PostgreSQL数据库内核分析》
2、参考书籍:《数据库事务处理的艺术:事务管理与并发控制》
3、PostgreSQL数据库仓库链接,点击前往
4、日本著名PostgreSQL数据库专家 铃木启修 网站主页,点击前往
5、参考书籍:《PostgreSQL指南:内幕探索》,点击前往
6、参考书籍:《事务处理 概念与技术》
7、pgtune 官方git仓库,点击前往
8、pgtune 官方在线使用,点击前往
1、本文内容全部来源于开源社区 GitHub和以上博主的贡献,本文也免费开源(可能会存在问题,评论区等待大佬们的指正)
2、本文目的:开源共享 抛砖引玉 一起学习
3、本文不提供任何资源 不存在任何交易 与任何组织和机构无关
4、大家可以根据需要自行 复制粘贴以及作为其他个人用途,但是不允许转载 不允许商用 (写作不易,还请见谅 💖)
5、本文内容基于PostgreSQL master源码开发而成
在线调优工具pgtune的实现原理和源码解析
- 文章快速说明索引
- 功能使用背景说明
- 功能实现源码解析
文章快速说明索引
学习目标:
做数据库内核开发久了就会有一种 少年得志,年少轻狂 的错觉,然鹅细细一品觉得自己其实不算特别优秀 远远没有达到自己想要的。也许光鲜的表面掩盖了空洞的内在,每每想到于此,皆有夜半临渊如履薄冰之感。为了睡上几个踏实觉,即日起 暂缓其他基于PostgreSQL数据库的兼容功能开发,近段时间 将着重于学习分享Postgres的基础知识和实践内幕。
学习内容:(详见目录)
1、在线调优工具pgtune的实现原理和源码解析
学习时间:
2024年11月24日 13:46:59
学习产出:
1、PostgreSQL数据库基础知识回顾 1个
2、CSDN 技术博客 1篇
3、PostgreSQL数据库内核深入学习
注:下面我们所有的学习环境是Centos8+PostgreSQL master +Oracle19C+MySQL8.0
postgres=# select version();
version
------------------------------------------------------------------------------------------------------------
PostgreSQL 18devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-21), 64-bit
(1 row)
postgres=#
#-----------------------------------------------------------------------------#
SQL> select * from v$version;
BANNER Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production
BANNER_FULL Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production Version 19.17.0.0.0
BANNER_LEGACY Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production
CON_ID 0
#-----------------------------------------------------------------------------#
mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.27 |
+-----------+
1 row in set (0.06 sec)
mysql>
功能使用背景说明
作用:根据硬件调整 PostgreSQL 配置。
优点:
- 无需下载或安装任何东西
- 也可以离线工作
- 可以作为移动应用程序工作
- 开源
使用一下,如下是(常用TPCC)一个推荐配置:
接下来看一下其实现细节,使用上面比较简单,这里不再赘述。后面大家在使用过程中有问题,也可以直接去官方仓库下提问题!
注1:因为这个算是一个通用设置,所能提供的参数较少(如下) 在具体的使用中,其他相关参数的设置大家可以自行分析:
const configData = [
['max_connections', maxConnectionsVal],
['shared_buffers', formatValue(sharedBuffersVal)],
['effective_cache_size', formatValue(effectiveCacheSizeVal)],
['maintenance_work_mem', formatValue(maintenanceWorkMemVal)],
['checkpoint_completion_target', checkpointCompletionTargetVal],
['wal_buffers', formatValue(walBuffersVal)],
['default_statistics_target', defaultStatisticsTargetVal],
['random_page_cost', randomPageCostVal],
['effective_io_concurrency', effectiveIoConcurrencyVal],
['work_mem', formatValue(workMemVal)],
['huge_pages', hugePagesVal]
]
注2:这不是标准答案,可以视为一种参考建议值 大家在使用中可以根据自己的实际情况进行调整!
可选的应用场景,有:
const dbTypeOptions = () => [
{
label: 'Web application',
value: DB_TYPE_WEB
},
{
label: 'Online transaction processing system',
value: DB_TYPE_OLTP
},
{
label: 'Data warehouse',
value: DB_TYPE_DW
},
{
label: 'Desktop application',
value: DB_TYPE_DESKTOP
},
{
label: 'Mixed type of application',
value: DB_TYPE_MIXED
}
]
因此大家要充分理解自己的业务情况!
功能实现源码解析
下面我们按照参数的顺序,挨个解释pgtune设置值的细节。
一、max_connections
这个值设置多少即为多少;若是没有设置 则使用下面的默认值:
参考点击:http://postgres.cn/docs/current/runtime-config-connection.html#GUC-MAX-CONNECTIONS
export const selectMaxConnections = createSelector(
[selectConnectionNum, selectDBType],
(connectionNum, dbType) =>
connectionNum
? connectionNum // 设置的走设置值,而非下面的默认值
: {
[DB_TYPE_WEB]: 200,
[DB_TYPE_OLTP]: 300,
[DB_TYPE_DW]: 40,
[DB_TYPE_DESKTOP]: 20,
[DB_TYPE_MIXED]: 100
}[dbType]
)
二、shared_buffers
参考点击:http://postgres.cn/docs/current/runtime-config-resource.html#GUC-SHARED-BUFFERS
export const selectSharedBuffers = createSelector(
[selectTotalMemoryInKb, selectDBType, selectOSType, selectDBVersion],
(totalMemoryKb, dbType, osType, dbVersion) => {
let sharedBuffersValue = {
[DB_TYPE_WEB]: Math.floor(totalMemoryKb / 4),
[DB_TYPE_OLTP]: Math.floor(totalMemoryKb / 4),
[DB_TYPE_DW]: Math.floor(totalMemoryKb / 4),
[DB_TYPE_DESKTOP]: Math.floor(totalMemoryKb / 16),
[DB_TYPE_MIXED]: Math.floor(totalMemoryKb / 4)
}[dbType]
if (dbVersion < 10 && OS_WINDOWS === osType) {
// Limit shared_buffers to 512MB on Windows
const winMemoryLimit = (512 * SIZE_UNIT_MAP['MB']) / SIZE_UNIT_MAP['KB']
if (sharedBuffersValue > winMemoryLimit) {
sharedBuffersValue = winMemoryLimit
}
}
return sharedBuffersValue
}
)
- 该参数通常设置建议为:
totalMemoryKb
的1/4 或 1/16(看业务场景) - 在pg10以下的Windows环境上,建议设置不超过 512MB
三、effective_cache_size
该参数通常建议设置为totalMemoryKb
的 3/4
参考点击:http://postgres.cn/docs/current/runtime-config-query.html#GUC-EFFECTIVE-CACHE-SIZE
export const selectEffectiveCacheSize = createSelector(
[selectTotalMemoryInKb, selectDBType],
(totalMemoryKb, dbType) =>
({
[DB_TYPE_WEB]: Math.floor((totalMemoryKb * 3) / 4),
[DB_TYPE_OLTP]: Math.floor((totalMemoryKb * 3) / 4),
[DB_TYPE_DW]: Math.floor((totalMemoryKb * 3) / 4),
[DB_TYPE_DESKTOP]: Math.floor(totalMemoryKb / 4),
[DB_TYPE_MIXED]: Math.floor((totalMemoryKb * 3) / 4)
})[dbType]
)
四、maintenance_work_mem
参考点击:http://postgres.cn/docs/current/runtime-config-resource.html#GUC-MAINTENANCE-WORK-MEM
export const selectMaintenanceWorkMem = createSelector(
[selectTotalMemoryInKb, selectDBType, selectOSType],
(totalMemoryKb, dbType, osType) => {
let maintenanceWorkMemValue = {
[DB_TYPE_WEB]: Math.floor(totalMemoryKb / 16),
[DB_TYPE_OLTP]: Math.floor(totalMemoryKb / 16),
[DB_TYPE_DW]: Math.floor(totalMemoryKb / 8),
[DB_TYPE_DESKTOP]: Math.floor(totalMemoryKb / 16),
[DB_TYPE_MIXED]: Math.floor(totalMemoryKb / 16)
}[dbType]
// Cap maintenance RAM at 2GB on servers with lots of memory
const memoryLimit = (2 * SIZE_UNIT_MAP['GB']) / SIZE_UNIT_MAP['KB']
if (maintenanceWorkMemValue >= memoryLimit) {
if (OS_WINDOWS === osType) {
// 2048MB (2 GB) will raise error at Windows, so we need remove 1 MB from it
maintenanceWorkMemValue = memoryLimit - (1 * SIZE_UNIT_MAP['MB']) / SIZE_UNIT_MAP['KB']
} else {
maintenanceWorkMemValue = memoryLimit
}
}
return maintenanceWorkMemValue
}
)
- 该参数通常是
totalMemoryKb
的 1/16 或 1/8 - 若是该值大于2G,则设置为2G
五、checkpoint_completion_target
设置为0.9
参考点击:http://postgres.cn/docs/current/runtime-config-wal.html#GUC-CHECKPOINT-COMPLETION-TARGET
export const selectCheckpointCompletionTarget = createSelector(
[],
() => 0.9 // based on https://github.com/postgres/postgres/commit/bbcc4eb2
)
六、wal_buffers
参考点击:http://postgres.cn/docs/current/runtime-config-wal.html#GUC-WAL-BUFFERS
export const selectWalBuffers = createSelector([selectSharedBuffers], (sharedBuffersValue) => {
// Follow auto-tuning guideline for wal_buffers added in 9.1, where it's
// set to 3% of shared_buffers up to a maximum of 16MB.
let walBuffersValue = Math.floor((3 * sharedBuffersValue) / 100)
const maxWalBuffer = (16 * SIZE_UNIT_MAP['MB']) / SIZE_UNIT_MAP['KB']
if (walBuffersValue > maxWalBuffer) {
walBuffersValue = maxWalBuffer
}
// It's nice of wal_buffers is an even 16MB if it's near that number. Since
// that is a common case on Windows, where shared_buffers is clipped to 512MB,
// round upwards in that situation
const walBufferNearValue = (14 * SIZE_UNIT_MAP['MB']) / SIZE_UNIT_MAP['KB']
if (walBuffersValue > walBufferNearValue && walBuffersValue < maxWalBuffer) {
walBuffersValue = maxWalBuffer
}
// if less, than 32 kb, than set it to minimum
if (walBuffersValue < 32) {
walBuffersValue = 32
}
return walBuffersValue
})
- 遵循 9.1 中添加的 wal_buffers 自动调整指南,将其设置为 shared_buffers 的 3%,最高为 16MB。
- 如果 wal_buffers 接近该数字,则最好是 16MB。由于这是 Windows 上的常见情况,其中 shared_buffers 被限制为 512MB,因此在这种情况下向上舍入
- 如果小于 32 kb,则将其设置为最小值
七、default_statistics_target
该参数使用以下默认值
参考点击:http://postgres.cn/docs/current/runtime-config-query.html#GUC-DEFAULT-STATISTICS-TARGET
export const selectDefaultStatisticsTarget = createSelector(
[selectDBType],
(dbType) =>
({
[DB_TYPE_WEB]: 100,
[DB_TYPE_OLTP]: 100,
[DB_TYPE_DW]: 500,
[DB_TYPE_DESKTOP]: 100,
[DB_TYPE_MIXED]: 100
})[dbType]
)
八、random_page_cost
根据硬盘的类型,该参数选择以下值
参考点击:http://postgres.cn/docs/current/runtime-config-query.html#GUC-RANDOM-PAGE-COST
export const selectRandomPageCost = createSelector([selectHDType], (hdType) => {
return {
[HARD_DRIVE_HDD]: 4,
[HARD_DRIVE_SSD]: 1.1,
[HARD_DRIVE_SAN]: 1.1
}[hdType]
})
九、effective_io_concurrency
该参数在Linux系统下,根据硬盘类型设置如下默认值
参考点击:http://postgres.cn/docs/current/runtime-config-resource.html#GUC-EFFECTIVE-IO-CONCURRENCY
export const selectEffectiveIoConcurrency = createSelector(
[selectOSType, selectHDType],
(osType, hdType) => {
if (osType !== OS_LINUX) {
return null
}
return {
[HARD_DRIVE_HDD]: 2,
[HARD_DRIVE_SSD]: 200,
[HARD_DRIVE_SAN]: 300
}[hdType]
}
)
十、parallel setting
在分析work_mem
之前,首先先看一下parallel setting
相关:
// 默认设置如下
const DEFAULT_DB_SETTINGS = {
default: {
['max_worker_processes']: 8,
['max_parallel_workers_per_gather']: 2,
['max_parallel_workers']: 8
}
}
如下(如果cpuNum
未指定或者太小,不再设置相关并行设置 进而使用上面默认):
export const selectParallelSettings = createSelector(
[selectDBVersion, selectDBType, selectCPUNum],
(dbVersion, dbType, cpuNum) => {
if (!cpuNum || cpuNum < 4) {
return []
}
let workersPerGather = Math.ceil(cpuNum / 2)
if (dbType !== DB_TYPE_DW && workersPerGather > 4) {
// 没有明确的证据表明每个新工人都会为每个新核心带来巨大的利益
workersPerGather = 4 // no clear evidence, that each new worker will provide big benefit for each noew core
}
let config = [
{
key: 'max_worker_processes',
value: cpuNum
},
{
key: 'max_parallel_workers_per_gather',
value: workersPerGather
}
]
if (dbVersion >= 10) {
config.push({
key: 'max_parallel_workers',
value: cpuNum
})
}
if (dbVersion >= 11) {
let parallelMaintenanceWorkers = Math.ceil(cpuNum / 2)
if (parallelMaintenanceWorkers > 4) {
parallelMaintenanceWorkers = 4 // no clear evidence, that each new worker will provide big benefit for each noew core
}
config.push({
key: 'max_parallel_maintenance_workers',
value: parallelMaintenanceWorkers
})
}
return config
}
)
max_worker_processes
设置成cpunummax_parallel_workers_per_gather
的设置:数仓的场景下workersPerGather = cpunum/2
;其他场景下workersPerGather
通常最大都是 4- 在pg10及其以上,
max_parallel_workers
设置成cpunum - 在pg11及其以上,
max_parallel_maintenance_workers
的设置 通常最大都是 4
参考点击:
- http://postgres.cn/docs/current/runtime-config-resource.html#GUC-MAX-WORKER-PROCESSES
- http://postgres.cn/docs/current/runtime-config-resource.html#GUC-MAX-PARALLEL-WORKERS-PER-GATHER
- http://postgres.cn/docs/current/runtime-config-resource.html#GUC-MAX-PARALLEL-WORKERS
- http://postgres.cn/docs/current/runtime-config-resource.html#GUC-MAX-PARALLEL-MAINTENANCE-WORKERS
十一、work_mem
参考点击:http://postgres.cn/docs/current/runtime-config-resource.html#GUC-WORK-MEM
export const selectWorkMem = createSelector(
[
selectTotalMemoryInKb,
selectSharedBuffers,
selectMaxConnections,
selectParallelSettings,
selectDbDefaultValues,
selectDBType
],
(
totalMemoryKb,
sharedBuffersValue,
maxConnectionsValue,
parallelSettingsValue,
dbDefaultValues,
dbType
) => {
const parallelForWorkMem = (() => {
if (parallelSettingsValue.length) {
const maxParallelWorkersPerGather = parallelSettingsValue.find(
(param) => param['key'] === 'max_parallel_workers_per_gather'
)
if (
maxParallelWorkersPerGather &&
maxParallelWorkersPerGather['value'] &&
maxParallelWorkersPerGather['value'] > 0
) {
return maxParallelWorkersPerGather['value']
}
}
if (
dbDefaultValues['max_parallel_workers_per_gather'] &&
dbDefaultValues['max_parallel_workers_per_gather'] > 0
) {
return dbDefaultValues['max_parallel_workers_per_gather']
}
return 1
})()
// work_mem is assigned any time a query calls for a sort, or a hash, or any other structure that needs a space allocation, which can happen multiple times per query. So you're better off assuming max_connections * 2 or max_connections * 3 is the amount of RAM that will actually use in reality. At the very least, you need to subtract shared_buffers from the amount you're distributing to connections in work_mem.
// The other thing to consider is that there's no reason to run on the edge of available memory. If you do that, there's a very high risk the out-of-memory killer will come along and start killing PostgreSQL backends. Always leave a buffer of some kind in case of spikes in memory usage. So your maximum amount of memory available in work_mem should be ((RAM - shared_buffers) / (max_connections * 3) / max_parallel_workers_per_gather).
const workMemValue =
(totalMemoryKb - sharedBuffersValue) / (maxConnectionsValue * 3) / parallelForWorkMem
let workMemResult = {
[DB_TYPE_WEB]: Math.floor(workMemValue),
[DB_TYPE_OLTP]: Math.floor(workMemValue),
[DB_TYPE_DW]: Math.floor(workMemValue / 2),
[DB_TYPE_DESKTOP]: Math.floor(workMemValue / 6),
[DB_TYPE_MIXED]: Math.floor(workMemValue / 2)
}[dbType]
// if less, than 64 kb, than set it to minimum
if (workMemResult < 64) {
workMemResult = 64
}
return workMemResult
}
)
简单分析一下:
parallelForWorkMem
的计算:来源于上面并行设置中max_parallel_workers_per_gather
work_mem
的计算:
- 每次查询调用排序、哈希或任何其他需要空间分配的结构时,都会分配 work_mem,每个查询可能会多次发生这种情况。因此,最好假设 max_connections * 2 或 max_connections * 3 是实际使用的 RAM 量。至少,您需要从分配给 work_mem 中连接的量中减去 shared_buffers
- 另一件需要考虑的事情是,没有理由在可用内存的边缘运行。如果这样做,内存不足杀手出现并开始杀死 PostgreSQL 后端的风险非常高。始终留下某种缓冲区以防内存使用量激增。因此,work_mem 中可用的最大内存量应为 ((RAM - shared_buffers) / (max_connections * 3) / max_parallel_workers_per_gather)
- 根据不同场景 计算出最后的建议值
// work_mem is assigned any time a query calls for a sort, or a hash, or any other structure that needs a space allocation, which can happen multiple times per query. So you're better off assuming max_connections * 2 or max_connections * 3 is the amount of RAM that will actually use in reality. At the very least, you need to subtract shared_buffers from the amount you're distributing to connections in work_mem.
// The other thing to consider is that there's no reason to run on the edge of available memory. If you do that, there's a very high risk the out-of-memory killer will come along and start killing PostgreSQL backends. Always leave a buffer of some kind in case of spikes in memory usage. So your maximum amount of memory available in work_mem should be ((RAM - shared_buffers) / (max_connections * 3) / max_parallel_workers_per_gather).
const workMemValue =
(totalMemoryKb - sharedBuffersValue) / (maxConnectionsValue * 3) / parallelForWorkMem
let workMemResult = {
[DB_TYPE_WEB]: Math.floor(workMemValue),
[DB_TYPE_OLTP]: Math.floor(workMemValue),
[DB_TYPE_DW]: Math.floor(workMemValue / 2),
[DB_TYPE_DESKTOP]: Math.floor(workMemValue / 6),
[DB_TYPE_MIXED]: Math.floor(workMemValue / 2)
}[dbType]
// if less, than 64 kb, than set it to minimum
if (workMemResult < 64) {
workMemResult = 64
}
十二、huge_pages
参考点击:http://postgres.cn/docs/current/runtime-config-resource.html#GUC-HUGE-PAGES
export const selectHugePages = createSelector(
[selectTotalMemoryInKb],
// more 32GB - better also have huge page
// 超过 32GB - 最好也有大页面
(totalMemoryKBytes) => (totalMemoryKBytes >= 33554432 ? 'try' : 'off')
)