postgresql_internals-14 学习笔记（七）—

不完全来自这本书，把查到的和之前的文章重新汇总整理了一把。

一、核心参数

几个容易弄混的进程和参数，关系图如下

1. max_worker_processes

整个实例可以同时运行的Background workers Processes最大数量
默认值为8，设置为0表示禁用并行，重启数据库生效
备库上此参数值必须 >= 主库
修改后重启生效，建议一起修改max_parallel_works和max_parallel_works_per_gather

Background workers Processes：
虽然名字里也带Background字样，但它不包含SysLogger,Bgwriter,WaLWriter等系统后台进程，主要是动态启动的进程：例如并行查询进程，插件的后台进程等，即图中蓝绿色部分

2. max_parallel_works

pg 10新增
整个实例并行查询进程（参考上图）可以同时运行的最大数量
也就是说它是max_worker_processes的一部分，因此其值不能大于max_worker_processes（大于则无效）
默认值为8，设置为0表示禁用并行，修改不需重启

max_parallel_workers代表的是最多的worker数量，设置为1代表有1个worker，加上主进程一起其实并行度为2；设置为0，才会只有主进程，才是串行。它其实是主进程最多可以fork的进程数量？如果最多可以fork出一个，实际上是有两个进程

Workers Launched: 1 不代表是串行，而是主进程fork了一个子进程，加上主进程一起其实并行度为2

3. max_parallel_works_per_gather

单条QUERY中，每个EXEC NODE最多能开启多少个并行worker
默认值为2，设置为0表示禁用并行，修改不需重启
不要设置太大（推荐1-4），每个worker都会消耗同等的work_mem，可能争抢严重
其值不超过max_work_processer和max_parallel_works

三个参数大小关系为

max_worker_processes>=max_parallel_works>=max_parallel_works_per_gather

4. max_parallel_maintenance_workers

pg 11新增
用于并行创建索引（只支持btree类型）
默认是2，在满足并行条件时会使用两个worker执行
配合maintenance_work_mem参数，可以有效提升创建索引的速度。

二、其他并行参数

1. parallel_setup_cost

启动并行worker的cost，启动并行worker需要建立共享内存等操作，属于额外成本，默认为1000。

2. parallel_tuple_cost

优化器通过并行进程处理一行数据的成本，默认是0.1。
表示每个Tuple从worker传递给leader的代价，即worker将一个tuple放入共享内存队列，然后leader从中读取的代价，默认值为0.1
进程间的row交换成本，按node评估的输出rows来乘。

7. min_parallel_table_scan_size

设置开启并行的条件之一，表占用空间小于此值将不会开启并行，并行顺序扫描场景下扫描的数据大小通常等于表大小，默认值8MB。但是也请注意，还有其他条件决定是否启用并行，所以并不是小于它的表就一定不会启用并行。

8. min_parallel_index_scan_size

设置开启并行的条件之一，实际上并行扫描不会扫描索引所有数据块，只是扫描索引相关数据块，默认值512kb

9. force_parallel_mode

强制开启并行，可以作为测试的目的，也可以作为hint来使用。OLTP生产环境不建议开启。

10. enable_partitionwise_aggregate

并行分区表分区聚合

11. enable_parallel_hash

并行HASH计算

12. enable_partitionwise_join

并行分区表JOIN

13. parallel_leader_participation

LEADER主动获取并行WORKER的返回结果

14. enable_parallel_append

并行APPEND（分区表），UNION ALL查询

15. parallel_workers

前面都是数据库的参数，parallel_workers是表级参数，可以在建表时设置，也可以后期设置。

二、 PG如何计算并行度

其实前面在讲参数时都已经讲到了，这里再总结一下：

max_worker_processes 决定整个系统能开多少个worker进程
parallel_setup_cost，parallel_tuple_cost 计算并行的成本，优化器根据CBO原则选择是否开启并行

所以简单QUERY，如果COST本来就很低（比如小于并行计算的启动成本），那么很显然数据库不会对这种QUERY启用并行计算，除非通过force_parallel_mode强制让优化器开启并行查询。

根据表级parallel_workers参数决定每个Gather node的并行度，取min(parallel_workers, max_parallel_workers_per_gather)
当表没有设置parallel_workers参数并且表的大小大于min_parallel_relation_size时，由算法决定每个Gather node的并行度，相关参数 min_parallel_relation_size

实际上，每个Gather能开启多少个worker还和PG集群总体剩余可以开启的worker进程数相关，因此实际开启的可能小于优化器算出来的，参考下面例子。

三、并行执行计划

Workers Planned 表示执行计划预估并行进程数
Workers Launched 表示查询实际并行进程数
Parallel关键字表示进行了并行操作

例子，WITH语法中，有两个QUERY用来并行计算，虽然设置的max_parallel_workers_per_gather=6，但是由于max_worker_processes=8，所以第一个Gather node用了6个worker process，而另一个Gather实际上只用了2个worker。

postgres=# show max_worker_processes ;
 max_worker_processes 
----------------------
 8
(1 row)
postgres=# set max_parallel_workers_per_gather=6;
SET
postgres=# explain (analyze,verbose,costs,timing,buffers) with t as (select count(*) from test), t1 as (select count(id) from test) select * from t,t1;
                                                                            QUERY PLAN                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=159471.81..159471.86 rows=1 width=16) (actual time=7763.033..7763.036 rows=1 loops=1)
   Output: t.count, t1.count
   Buffers: shared hit=32940 read=74784
   CTE t
     ->  Finalize Aggregate  (cost=79735.90..79735.91 rows=1 width=8) (actual time=4714.114..4714.115 rows=1 loops=1)
           Output: count(*)
           Buffers: shared hit=16564 read=37456
           ->  Gather  (cost=79735.27..79735.88 rows=6 width=8) (actual time=4714.016..4714.102 rows=7 loops=1)
                 Output: (PARTIAL count(*))
                 Workers Planned: 6
                 Workers Launched: 6
                 Buffers: shared hit=16564 read=37456
                 ->  Partial Aggregate  (cost=78735.27..78735.28 rows=1 width=8) (actual time=4709.465..4709.466 rows=1 loops=7)
                       Output: PARTIAL count(*)
                       Buffers: shared hit=16084 read=37456
                       Worker 0: actual time=4709.146..4709.146 rows=1 loops=1
                         Buffers: shared hit=2167 read=5350
                       Worker 1: actual time=4708.156..4708.156 rows=1 loops=1
                         Buffers: shared hit=2140 read=5288
                       Worker 2: actual time=4708.370..4708.370 rows=1 loops=1
                         Buffers: shared hit=2165 read=4990
                       Worker 3: actual time=4708.968..4708.969 rows=1 loops=1
                         Buffers: shared hit=2501 read=5529
                       Worker 4: actual time=4709.194..4709.195 rows=1 loops=1
                         Buffers: shared hit=2469 read=5473
                       Worker 5: actual time=4708.812..4708.813 rows=1 loops=1
                         Buffers: shared hit=2155 read=5349
                       ->  Parallel Seq Scan on public.test  (cost=0.00..73696.22 rows=2015622 width=0) (actual time=0.051..2384.380 rows=1728571 loops=7)
                             Buffers: shared hit=16084 read=37456
                             Worker 0: actual time=0.046..2385.108 rows=1698802 loops=1
                               Buffers: shared hit=2167 read=5350
                             Worker 1: actual time=0.057..2384.698 rows=1678728 loops=1
                               Buffers: shared hit=2140 read=5288
                             Worker 2: actual time=0.061..2384.109 rows=1617030 loops=1
                               Buffers: shared hit=2165 read=4990
                             Worker 3: actual time=0.046..2387.143 rows=1814780 loops=1
                               Buffers: shared hit=2501 read=5529
                             Worker 4: actual time=0.046..2382.491 rows=1794892 loops=1
                               Buffers: shared hit=2469 read=5473
                             Worker 5: actual time=0.070..2383.598 rows=1695904 loops=1
                               Buffers: shared hit=2155 read=5349
   CTE t1
     ->  Finalize Aggregate  (cost=79735.90..79735.91 rows=1 width=8) (actual time=3048.902..3048.902 rows=1 loops=1)
           Output: count(test_1.id)
           Buffers: shared hit=16376 read=37328
           ->  Gather  (cost=79735.27..79735.88 rows=6 width=8) (actual time=3048.732..3048.880 rows=3 loops=1)
                 Output: (PARTIAL count(test_1.id))
                 Workers Planned: 6
                 Workers Launched: 2
                 Buffers: shared hit=16376 read=37328
                 ->  Partial Aggregate  (cost=78735.27..78735.28 rows=1 width=8) (actual time=3046.399..3046.400 rows=1 loops=3)
                       Output: PARTIAL count(test_1.id)
                       Buffers: shared hit=16212 read=37328
                       Worker 0: actual time=3045.394..3045.395 rows=1 loops=1
                         Buffers: shared hit=5352 read=12343
                       Worker 1: actual time=3045.339..3045.340 rows=1 loops=1
                         Buffers: shared hit=5354 read=12402
                       ->  Parallel Seq Scan on public.test test_1  (cost=0.00..73696.22 rows=2015622 width=4) (actual time=0.189..1614.261 rows=4033333 loops=3)
                             Output: test_1.id
                             Buffers: shared hit=16212 read=37328
                             Worker 0: actual time=0.039..1617.258 rows=3999030 loops=1
                               Buffers: shared hit=5352 read=12343
                             Worker 1: actual time=0.033..1610.934 rows=4012856 loops=1
                               Buffers: shared hit=5354 read=12402
   ->  CTE Scan on t  (cost=0.00..0.02 rows=1 width=8) (actual time=4714.120..4714.121 rows=1 loops=1)
         Output: t.count
         Buffers: shared hit=16564 read=37456
   ->  CTE Scan on t1  (cost=0.00..0.02 rows=1 width=8) (actual time=3048.907..3048.908 rows=1 loops=1)
         Output: t1.count
         Buffers: shared hit=16376 read=37328
 Planning time: 0.144 ms
 Execution time: 7766.458 ms
(72 rows)

参考

https://www.jianshu.com/p/4f819a43882d?from=timeline&isappinstalled=0
https://yq.aliyun.com/articles/700370
https://blog.csdn.net/pg_hgdb/article/details/94594419
https://www.postgresql.org/docs/10/runtime-config-resource.html#GUC-MAX-PARALLEL-WORKERS-PER-GATHER
https://yq.aliyun.com/articles/59180

初探PostgreSQL中的并行

PostgreSQL 9.6 并行计算优化器算法浅析-阿里云开发者社区