不完全来自这本书,把查到的和之前的文章重新汇总整理了一把。
一、 核心参数
几个容易弄混的进程和参数,关系图如下
1. max_worker_processes
- 整个实例可以同时运行的Background workers Processes最大数量
- 默认值为8,设置为0表示禁用并行,重启数据库生效
- 备库上此参数值必须 >= 主库
- 修改后重启生效,建议一起修改max_parallel_works和max_parallel_works_per_gather
Background workers Processes:
虽然名字里也带Background字样,但它不包含SysLogger,Bgwriter,WaLWriter等系统后台进程,主要是动态启动的进程:例如并行查询进程,插件的后台进程等,即图中蓝绿色部分
2. max_parallel_works
- pg 10新增
- 整个实例 并行查询进程(参考上图)可以同时运行的最大数量
- 也就是说它是max_worker_processes的一部分,因此其值不能大于max_worker_processes(大于则无效)
- 默认值为8,设置为0表示禁用并行,修改不需重启
max_parallel_workers代表的是最多的worker数量,设置为1代表有1个worker,加上主进程一起其实并行度为2;设置为0,才会只有主进程,才是串行。它其实是主进程最多可以fork的进程数量?如果最多可以fork出一个,实际上是有两个进程
Workers Launched: 1 不代表是串行,而是主进程fork了一个子进程,加上主进程一起其实并行度为2
3. max_parallel_works_per_gather
- 单条QUERY中,每个EXEC NODE最多能开启多少个并行worker
- 默认值为2,设置为0表示禁用并行,修改不需重启
- 不要设置太大(推荐1-4),每个worker都会消耗同等的work_mem,可能争抢严重
- 其值不超过max_work_processer和max_parallel_works
三个参数大小关系为
max_worker_processes>=max_parallel_works>=max_parallel_works_per_gather
4. max_parallel_maintenance_workers
- pg 11新增
- 用于并行创建索引(只支持btree类型)
- 默认是2,在满足并行条件时会使用两个worker执行
- 配合maintenance_work_mem参数,可以有效提升创建索引的速度。
二、 其他并行参数
1. parallel_setup_cost
启动并行worker的cost,启动并行worker需要建立共享内存等操作,属于额外成本,默认为1000。
2. parallel_tuple_cost
- 优化器通过并行进程处理一行数据的成本,默认是0.1。
- 表示每个Tuple从worker传递给leader的代价,即worker将一个tuple放入共享内存队列,然后leader从中读取的代价,默认值为0.1
- 进程间的row交换成本,按node评估的输出rows来乘。
7. min_parallel_table_scan_size
设置开启并行的条件之一,表占用空间小于此值将不会开启并行,并行顺序扫描场景下扫描的数据大小通常等于表大小,默认值8MB。但是也请注意,还有其他条件决定是否启用并行,所以并不是小于它的表就一定不会启用并行。
8. min_parallel_index_scan_size
设置开启并行的条件之一,实际上并行扫描不会扫描索引所有数据块,只是扫描索引相关数据块,默认值512kb
9. force_parallel_mode
强制开启并行,可以作为测试的目的,也可以作为hint来使用。OLTP生产环境不建议开启。
10. enable_partitionwise_aggregate
并行分区表分区聚合
11. enable_parallel_hash
并行HASH计算
12. enable_partitionwise_join
并行分区表JOIN
13. parallel_leader_participation
LEADER主动获取并行WORKER的返回结果
14. enable_parallel_append
并行APPEND(分区表),UNION ALL查询
15. parallel_workers
前面都是数据库的参数,parallel_workers是表级参数,可以在建表时设置,也可以后期设置。
二、 PG如何计算并行度
其实前面在讲参数时都已经讲到了,这里再总结一下:
- max_worker_processes 决定整个系统能开多少个worker进程
- parallel_setup_cost,parallel_tuple_cost 计算并行的成本,优化器根据CBO原则选择是否开启并行
所以简单QUERY,如果COST本来就很低(比如小于并行计算的启动成本),那么很显然数据库不会对这种QUERY启用并行计算,除非通过force_parallel_mode强制让优化器开启并行查询。
- 根据表级parallel_workers参数决定每个Gather node的并行度,取min(parallel_workers, max_parallel_workers_per_gather)
- 当表没有设置parallel_workers参数并且表的大小大于min_parallel_relation_size时,由算法决定每个Gather node的并行度,相关参数 min_parallel_relation_size
实际上,每个Gather能开启多少个worker还和PG集群总体剩余可以开启的worker进程数相关,因此实际开启的可能小于优化器算出来的,参考下面例子。
三、 并行执行计划
- Workers Planned 表示执行计划预估并行进程数
- Workers Launched 表示查询实际并行进程数
- Parallel关键字表示进行了并行操作
例子,WITH语法中,有两个QUERY用来并行计算,虽然设置的max_parallel_workers_per_gather=6,但是由于max_worker_processes=8,所以第一个Gather node用了6个worker process,而另一个Gather实际上只用了2个worker。
postgres=# show max_worker_processes ;
max_worker_processes
----------------------
8
(1 row)
postgres=# set max_parallel_workers_per_gather=6;
SET
postgres=# explain (analyze,verbose,costs,timing,buffers) with t as (select count(*) from test), t1 as (select count(id) from test) select * from t,t1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=159471.81..159471.86 rows=1 width=16) (actual time=7763.033..7763.036 rows=1 loops=1)
Output: t.count, t1.count
Buffers: shared hit=32940 read=74784
CTE t
-> Finalize Aggregate (cost=79735.90..79735.91 rows=1 width=8) (actual time=4714.114..4714.115 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=16564 read=37456
-> Gather (cost=79735.27..79735.88 rows=6 width=8) (actual time=4714.016..4714.102 rows=7 loops=1)
Output: (PARTIAL count(*))
Workers Planned: 6
Workers Launched: 6
Buffers: shared hit=16564 read=37456
-> Partial Aggregate (cost=78735.27..78735.28 rows=1 width=8) (actual time=4709.465..4709.466 rows=1 loops=7)
Output: PARTIAL count(*)
Buffers: shared hit=16084 read=37456
Worker 0: actual time=4709.146..4709.146 rows=1 loops=1
Buffers: shared hit=2167 read=5350
Worker 1: actual time=4708.156..4708.156 rows=1 loops=1
Buffers: shared hit=2140 read=5288
Worker 2: actual time=4708.370..4708.370 rows=1 loops=1
Buffers: shared hit=2165 read=4990
Worker 3: actual time=4708.968..4708.969 rows=1 loops=1
Buffers: shared hit=2501 read=5529
Worker 4: actual time=4709.194..4709.195 rows=1 loops=1
Buffers: shared hit=2469 read=5473
Worker 5: actual time=4708.812..4708.813 rows=1 loops=1
Buffers: shared hit=2155 read=5349
-> Parallel Seq Scan on public.test (cost=0.00..73696.22 rows=2015622 width=0) (actual time=0.051..2384.380 rows=1728571 loops=7)
Buffers: shared hit=16084 read=37456
Worker 0: actual time=0.046..2385.108 rows=1698802 loops=1
Buffers: shared hit=2167 read=5350
Worker 1: actual time=0.057..2384.698 rows=1678728 loops=1
Buffers: shared hit=2140 read=5288
Worker 2: actual time=0.061..2384.109 rows=1617030 loops=1
Buffers: shared hit=2165 read=4990
Worker 3: actual time=0.046..2387.143 rows=1814780 loops=1
Buffers: shared hit=2501 read=5529
Worker 4: actual time=0.046..2382.491 rows=1794892 loops=1
Buffers: shared hit=2469 read=5473
Worker 5: actual time=0.070..2383.598 rows=1695904 loops=1
Buffers: shared hit=2155 read=5349
CTE t1
-> Finalize Aggregate (cost=79735.90..79735.91 rows=1 width=8) (actual time=3048.902..3048.902 rows=1 loops=1)
Output: count(test_1.id)
Buffers: shared hit=16376 read=37328
-> Gather (cost=79735.27..79735.88 rows=6 width=8) (actual time=3048.732..3048.880 rows=3 loops=1)
Output: (PARTIAL count(test_1.id))
Workers Planned: 6
Workers Launched: 2
Buffers: shared hit=16376 read=37328
-> Partial Aggregate (cost=78735.27..78735.28 rows=1 width=8) (actual time=3046.399..3046.400 rows=1 loops=3)
Output: PARTIAL count(test_1.id)
Buffers: shared hit=16212 read=37328
Worker 0: actual time=3045.394..3045.395 rows=1 loops=1
Buffers: shared hit=5352 read=12343
Worker 1: actual time=3045.339..3045.340 rows=1 loops=1
Buffers: shared hit=5354 read=12402
-> Parallel Seq Scan on public.test test_1 (cost=0.00..73696.22 rows=2015622 width=4) (actual time=0.189..1614.261 rows=4033333 loops=3)
Output: test_1.id
Buffers: shared hit=16212 read=37328
Worker 0: actual time=0.039..1617.258 rows=3999030 loops=1
Buffers: shared hit=5352 read=12343
Worker 1: actual time=0.033..1610.934 rows=4012856 loops=1
Buffers: shared hit=5354 read=12402
-> CTE Scan on t (cost=0.00..0.02 rows=1 width=8) (actual time=4714.120..4714.121 rows=1 loops=1)
Output: t.count
Buffers: shared hit=16564 read=37456
-> CTE Scan on t1 (cost=0.00..0.02 rows=1 width=8) (actual time=3048.907..3048.908 rows=1 loops=1)
Output: t1.count
Buffers: shared hit=16376 read=37328
Planning time: 0.144 ms
Execution time: 7766.458 ms
(72 rows)
参考
https://www.jianshu.com/p/4f819a43882d?from=timeline&isappinstalled=0
https://yq.aliyun.com/articles/700370
https://blog.csdn.net/pg_hgdb/article/details/94594419
https://www.postgresql.org/docs/10/runtime-config-resource.html#GUC-MAX-PARALLEL-WORKERS-PER-GATHER
https://yq.aliyun.com/articles/59180
初探PostgreSQL中的并行
PostgreSQL 9.6 并行计算 优化器算法浅析-阿里云开发者社区