在执行一个带有IS NOT NULL或者NOT NULL的SQL的时候,通常会对表的每一行,都会进行检查以确保列为空/不为空,这是符合常理的。
但是如果本身这个列上有非空(NOT NULL)约束,再次检查就会浪费资源。甚至有时候走索引,但是还需要回表扫描整个表去确认是否满足NULL的条件,这个时候明显是不太合理的。
在PostgreSQL16版本及以前,就算原本的列上有非空索引,查询条件带有NULL和NOT NULL,也感知不到,依然会去扫描表去评估,增加额外的计划和执行的开销。
postgres<16.1>(ConnAs[postgres]:PID[4639] 2024-05-29/12:02:51)=# create table t1 as select * from pg_class;
SELECT 494
postgres<16.1>(ConnAs[postgres]:PID[4639] 2024-05-29/12:02:55)=# alter table t1 alter COLUMN oid set not null;
ALTER TABLE
postgres<16.1>(ConnAs[postgres]:PID[4639] 2024-05-29/12:09:59)=# \d t1
Table "public.t1"
+---------------------+--------------+-----------+----------+---------+
| Column | Type | Collation | Nullable | Default |
+---------------------+--------------+-----------+----------+---------+
| oid | oid | | not null | |
| relname | name | | | |
| relnamespace | oid | | | |
| reltype | oid | | | |
| reloftype | oid | | | |
| relowner | oid | | | |
| relam | oid | | | |
| relfilenode | oid | | | |
| reltablespace | oid | | | |
... ...
postgres<16.1>(ConnAs[postgres]:PID[4639] 2024-05-29/12:10:05)=# explain (costs off) select * from t1 where oid IS NULL;
+-------------------------+
| QUERY PLAN |
+-------------------------+
| Seq Scan on t1 |
| Filter: (oid IS NULL) |
+-------------------------+
(2 rows)
postgres<16.1>(ConnAs[postgres]:PID[4639] 2024-05-29/12:11:03)=# explain (costs off) select * from t1 where oid IS NOT NULL;
+-----------------------------+
| QUERY PLAN |
+-----------------------------+
| Seq Scan on t1 |
| Filter: (oid IS NOT NULL) |
+-----------------------------+
(2 rows)
David Rowley的相关邮件里也强调了:当我们优化Min/Max聚合时,规划器添加的IS NOT NULL qual会使重写的计划忽略NULL,这可能会导致索引选择不佳的问题。特别是那些没有统计数据,让规划器估算的近似选择性的条件,可能会导致较差的索引选择,例如LIMIT 1更倾向于廉价的启动路径。
PostgreSQL: Removing const-false IS NULL quals and redundant IS NOT NULL quals
在 PostgreSQL17-beta1 中,为了改进IS [NOT] NULL涉及条件时的规划,NOT NULL首先对列进行条件检查。让planner(规划器)检查NOT NULL列,并让planner(规划器)为所有查询删除这些qual(当不需要它们时),而不仅仅是在优化最小/最大聚合时。 并且还检测NOT NULL列上的IS NULL qual,这也有助于自联接删除工作,因为它必须将严格的联接限定符替换为IS NOT NULL限定符,以确保与原始查询等效。
改进后的 PostgreSQL17-beta1 的现象如下所示:
postgres<17beta1>(ConnAs[postgres]:PID[21590] 2024-05-28/15:29:09)=# create table t1 as select * from pg_class;
SELECT 415
postgres<17beta1>(ConnAs[postgres]:PID[21590] 2024-05-28/15:29:26)=# alter table t1 alter COLUMN oid set not null;
ALTER TABLE
postgres<17beta1>(ConnAs[postgres]:PID[21590] 2024-05-28/15:29:20)=# \d t1
Table "public.t1"
+---------------------+--------------+-----------+----------+---------+
| Column | Type | Collation | Nullable | Default |
+---------------------+--------------+-----------+----------+---------+
| oid | oid | | not null | |
| relname | name | | | |
| relnamespace | oid | | | |
| reltype | oid | | | |
| reloftype | oid | | | |
| relowner | oid | | | |
| relam | oid | | | |
| relfilenode | oid | | | |
| reltablespace | oid | | | |
| relpages | integer | | | |
| reltuples | real | | | |
| relallvisible | integer | | | |
... ...
postgres<17beta1>(ConnAs[postgres]:PID[21590] 2024-05-28/15:29:39)=# explain (costs off) select * from t1 where oid IS NULL;
+--------------------------+
| QUERY PLAN |
+--------------------------+
| Result |
| One-Time Filter: false |
+--------------------------+
(2 rows)
postgres<17beta1>(ConnAs[postgres]:PID[21590] 2024-05-28/15:29:46)=# explain (costs off) select * from t1 where oid IS NOT NULL;
+----------------+
| QUERY PLAN |
+----------------+
| Seq Scan on t1 |
+----------------+
(1 row)
参考文档:
1.https://www.postgresql.org/message-id/flat/17540-7aa1855ad5ec18b4@postgresql.org
2.https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b262ad440edecda0b1aba81d967ab560a83acb8a