最近在巡检PostgreSQL的数据库的时候,发现部分数据库里存在大量的如下报错
ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
一、问题分析
通过报错的字段可以看出是使用了PostgreSQL 9.5 引入的一项新功能,insert on conflict do功能,即UPSERT的效果,当插入遇到约束错误时,直接返回,或者改为执行UPDATE。观察日志可以看出执行的SQL是带有多条记录的,推测多条记录的key值冲突了,因此报了这个错。
PostgreSQL 的 upsert 功能:当记录不存在时,执行插入;否则,进行更新。
PostgreSQL的代码里src/backend/executor/nodeModifyTable.c下的ExecOnConflictUpdate()函数里的注释部分其实解释的很清楚了。当在同一命令中再次更新刚插入的元组时,可能会发生这种情况。例如,因为插入了多个具有相同冲突键值的行。
MERGE 也有同样的问题,SQL-2003标准也类似地规定MERGE,在尝试更新同一行两次时必须引发异常。因为一次请求中对行的处理,顺序是不固定的,数据库不知道应该以哪条为最后需要保留的数据。
注释部分也明确表示出现这种问题属于用户的责任,PostgreSQL不会主动去处理这种报错,这种不在同一条SQL中出现多条相同KEY的数据的问题应该让用户去保障。
二、问题复现
1.建立测试表
postgres<16.1>(ConnAs[postgres]:PID[1103345] 2024-04-16/14:09:21)=# create table test(id int primary key, info text, crt_time timestamp);
CREATE TABLE
postgres<16.1>(ConnAs[postgres]:PID[1103345] 2024-04-16/14:09:22)=# insert into test values (1,'test',now()) on conflict (id) do update set info=excluded.info,crt_time=excluded.crt_time;
INSERT 0 1
postgres<16.1>(ConnAs[postgres]:PID[1103345] 2024-04-16/14:09:44)=# select * from test;
+----+------+----------------------------+
| id | info | crt_time |
+----+------+----------------------------+
| 1 | test | 2024-04-16 14:09:44.405528 |
+----+------+----------------------------+
(1 row)
postgres<16.1>(ConnAs[postgres]:PID[1103345] 2024-04-16/14:09:51)=# insert into test values (2,'hah','2024-04-16 14:01:33.640731') on conflict (id) do update set info=excluded.info,crt_time=excluded.crt_time;
INSERT 0 1
postgres<16.1>(ConnAs[postgres]:PID[1103345] 2024-04-16/14:10:01)=# select * from test;
+----+------+----------------------------+
| id | info | crt_time |
+----+------+----------------------------+
| 1 | test | 2024-04-16 14:09:44.405528 |
| 2 | hah | 2024-04-16 14:01:33.640731 |
+----+------+----------------------------+
(2 rows)
2.模拟连续两次插入同样的主键值
可以看到,id是主键,我连续两次插入id=7,但是没有主键冲突,原本的insert变成了更新,把id=7的其他字段进行了更新。
postgres<16.1>(ConnAs[postgres]:PID[1103345] 2024-04-16/14:11:22)=# insert into test values (7,'test',now()) on conflict (id) do update set info=excluded.
info,crt_time=excluded.crt_time;
INSERT 0 1
postgres<16.1>(ConnAs[postgres]:PID[1103345] 2024-04-16/14:11:38)=# select * from test where id=7;
+----+------+----------------------------+
| id | info | crt_time |
+----+------+----------------------------+
| 7 | test | 2024-04-16 14:11:38.229476 |
+----+------+----------------------------+
(1 row)
postgres<16.1>(ConnAs[postgres]:PID[1103345] 2024-04-16/14:11:51)=# insert into test values (7,'ha','2024-04-16 14:15:38.229476') on conflict (id) do upda
te set info=excluded.info,crt_time=excluded.crt_time;
INSERT 0 1
postgres<16.1>(ConnAs[postgres]:PID[1103345] 2024-04-16/14:12:11)=# select * from test where id=7;
+----+------+----------------------------+
| id | info | crt_time |
+----+------+----------------------------+
| 7 | ha | 2024-04-16 14:15:38.229476 |
+----+------+----------------------------+
(1 row)
3.模拟一条SQL包含相同的key值
可以看到,一个SQL里插入多个行记录,并且key值相同(违反主键冲突)的时候,报了ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time。上文我们也通过代码的注释部分,知道了这种问题属于用户的责任,需要用户保障,因此这部分的内容可能需要结合业务侧进行分析,检查下业务逻辑了。
postgres<16.1>(ConnAs[postgres]:PID[1103345] 2024-04-16/14:10:03)=# insert into test values (3,'hah','2024-04-16 14:01:33.640731'),(3,'hah','2024-04-16 14:01:33.640731') on conflict (id) do update set info=excluded.info,crt_time=excluded.crt_time;
ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
postgres<16.1>(ConnAs[postgres]:PID[1103345] 2024-04-16/14:10:27)=# insert into test values (4,'hah','2024-04-16 14:01:33.640731'),(5,'hah','2024-04-16 14:
01:33.640731') on conflict (id) do update set info=excluded.info,crt_time=excluded.crt_time;
INSERT 0 2
postgres<16.1>(ConnAs[postgres]:PID[1103345] 2024-04-16/14:10:38)=# select * from test;
+----+------+----------------------------+
| id | info | crt_time |
+----+------+----------------------------+
| 1 | test | 2024-04-16 14:09:44.405528 |
| 2 | hah | 2024-04-16 14:01:33.640731 |
| 4 | hah | 2024-04-16 14:01:33.640731 |
| 5 | hah | 2024-04-16 14:01:33.640731 |
+----+------+----------------------------+
(4 rows)