1 设置基于时间点恢复所需的配置
要启用WAL归档,
- 需设置wal_level配置参数为
replica
或更高, - 设置archive_mode为
on
, - 并且使用archive_command配置参数指定一个shell命令
1.1 修改配置文件 postgresql.conf
vim postgresql.conf
archive_mode = on
archive_command = 'cp %p /data1/backups/pg_wal_archive/%f'
wal_level = replica
1.2 生效配置
2 进行一次全备
2.1 创建备份目录
mkdir -p /data/backup/pg_backup
chown postgres:postgres /data/backup/pg_backup
2.2 进行一次全备
pg_basebackup -D /data/backup/pg_backup -v -P -R
备份输出日志 :
pg_basebackup -D /data/backup/pg_backup -v -P -R
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/1F000028 on timeline 10
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_20961"
57113/57113 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/1F000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed
3 模拟增量数据插入
yanhaihang=# insert into foo values (100011),(100012),(100013);
INSERT 0 3
4 查看当前时间点,LSN号,XID 事务ID,当前的WAL 文件等
yanhaihang=# select now();
now
-------------------------------
2024-09-25 17:16:45.699173+08
(1 row)
yanhaihang=# select pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/20000BD8
(1 row)
yanhaihang=# select txid_current();
txid_current
--------------
778
(1 row)
yanhaihang=# select pg_walfile_name(pg_current_wal_lsn());
pg_walfile_name
--------------------------
0000000A0000000000000020
(1 row)
查看插入增量数据的时间点 (作为恢复的时间点)
yanhaihang=# select now();
now
-------------------------------
2024-09-25 17:16:45.699173+08
LSN号
yanhaihang=# select pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/20000BD8
XID 事务ID
yanhaihang=# select txid_current();
txid_current
--------------
778
当前的WAL 文件
yanhaihang=# select pg_walfile_name(pg_current_wal_lsn());
pg_walfile_name
--------------------------
0000000A0000000000000020
5 进行一次WAL 日志的切换
yanhaihang=# SELECT pg_switch_wal();
pg_switch_wal
---------------
0/20000D38
再次查看当前的WAL日志,可以看到 由原来的20 变为了 21
yanhaihang=# select pg_walfile_name(pg_current_wal_lsn());
pg_walfile_name
--------------------------
0000000A0000000000000021
wal 日志目录 ,
查看归档目录 ,可以看到 0000000A0000000000000020 日志已经归档
6 模拟故障发生
假如以下场景发生: 有研发删除了foo全表 ,或者错误更新了foo全表 ,或者机房故障导致数据库启动不了。
如果只用全备恢复,从全备时刻到故障时刻的增量的DML 就会丢失 。
7 进行基于时间点的恢复
7.1 停止原实例
pg_ctl -D /data/storage/pgsql/data stop
7.2 将原来的数据目录 mv 走,
mv data data_bak
7.3 将全备拷贝一份为pg 的数据目录:
cp -R /data/backup/pg_backup/ /data/storage/pgsql/data
7.4 修改目录权限和属组用户
chown -R postgres:postgres /data/storage/pgsql/data
chmod 700 /data/storage/pgsql/data
7.5 配置增量恢复 修改文件 postgresql.auto.conf
vim postgresql.auto.conf
restore_command = 'cp /data1/backups/pg_wal_archive/%f %p'
recovery_target_time = '2024-09-25 17:16:45'
7.6 创建recovery.signal文件
因为备份的时候加了-R 所以会创建 standby.signal 文件 ,如果你是想让改实例作为主库启动 将该文件删除
创建recovery.signal,
我这里直接mv
mv standby.signal recovery.signal
7,7 启动恢复的实例
pg_ctl -D /data/storage/pgsql/data start
日志输出:
$pg_ctl -D /data/storage/pgsql/data start
waiting for server to start....2024-09-25 17:35:02.439 CST [14388] LOG: starting PostgreSQL 13.12 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
2024-09-25 17:35:02.440 CST [14388] LOG: listening on IPv4 address "0.0.0.0", port 5432
2024-09-25 17:35:02.441 CST [14388] LOG: could not create IPv6 socket for address "::": Address family not supported by protocol
2024-09-25 17:35:02.441 CST [14388] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-09-25 17:35:02.443 CST [14395] LOG: database system was interrupted; last known up at 2024-09-25 17:12:10 CST
cp: cannot stat ‘/data1/backups/pg_wal_archive/0000000B.history’: No such file or directory
2024-09-25 17:35:02.485 CST [14395] LOG: starting point-in-time recovery to 2024-09-25 17:16:45+08
2024-09-25 17:35:02.487 CST [14395] LOG: restored log file "0000000A.history" from archive
2024-09-25 17:35:02.501 CST [14395] LOG: restored log file "0000000A000000000000001F" from archive
2024-09-25 17:35:02.515 CST [14395] LOG: redo starts at 0/1F000028
2024-09-25 17:35:02.516 CST [14395] LOG: consistent recovery state reached at 0/1F000100
2024-09-25 17:35:02.516 CST [14388] LOG: database system is ready to accept read only connections
2024-09-25 17:35:02.529 CST [14395] LOG: restored log file "0000000A0000000000000020" from archive
done
server started
[postgres@erp-db-mysql-s17.ys:/data/storage/pgsql/data]$2024-09-25 17:35:02.538 CST [14395] LOG: recovery stopping before commit of transaction 778, time 2024-09-25 17:16:58.116118+08
2024-09-25 17:35:02.538 CST [14395] LOG: pausing at the end of recovery
2024-09-25 17:35:02.538 CST [14395] HINT: Execute pg_wal_replay_resume() to promote.
8 检验数据
select * from foo order by id desc limit 10;
9 恢复可写
写入数据报错
insert into foo values (100014);
yanhaihang=# insert into foo values (100014);
2024-09-25 17:36:48.574 CST [18061] ERROR: cannot execute INSERT in a read-only transaction
2024-09-25 17:36:48.574 CST [18061] STATEMENT: insert into foo values (100014);
ERROR: cannot execute INSERT in a read-only transaction
恢复可写
select pg_wal_replay_resume();
yanhaihang=# select pg_wal_replay_resume();
pg_wal_replay_resume
----------------------
(1 row)
yanhaihang=# 2024-09-25 17:37:52.704 CST [14395] LOG: redo done at 0/20000BD8
2024-09-25 17:37:52.704 CST [14395] LOG: last completed transaction was at log time 2024-09-25 17:14:27.560912+08
cp: cannot stat ‘/data1/backups/pg_wal_archive/0000000B.history’: No such file or directory
2024-09-25 17:37:52.708 CST [14395] LOG: selected new timeline ID: 11
2024-09-25 17:37:52.727 CST [14395] LOG: archive recovery complete
2024-09-25 17:37:52.729 CST [14395] LOG: restored log file "0000000A.history" from archive
2024-09-25 17:37:52.737 CST [14388] LOG: database system is ready to accept connections
常见报错
报错1 使用了错误的恢复配置文件
报错关键字 FATAL: using recovery command file "recovery.conf" is not supported
报错1原因 : 高版本移除了recovery.conf 文件
PostgreSQL12 附录 E. 版本说明
-
将
recovery.conf
设置移动到postgresql.conf中。 (Masao Fujii, Simon Riggs, Abhijit Menon-Sen, Sergei Kornilov)
recovery.conf
不再使用,如果该文件存在,服务器将无法启动。 recovery.signal和standby.signal
文件现在用于切换到非主模式。 trigger_file
设置已更名为promote_trigger_file。 删除了standby_mode
设置。
官方文档参考
http://postgres.cn/docs/13/continuous-archiving.html
报错2 找不到需要的WAL日志
报错关键字
cp: cannot stat ‘/data1/backups/pg_wal_archive/0000000A.history’: No such file or directory
cp: cannot stat ‘/data1/backups/pg_wal_archive/00000009000000000000001C’: No such file or directory
FATAL: recovery ended before configured recovery target was reached
原因:需要的wal 没有被归档?
解决 : 进行一次WAL切换 SELECT pg_switch_wal();
TODO
WAL的切换规则
参考
多种恢复方式 官方文档 中的 19.5.4. 归档恢复
http://postgres.cn/docs/13/runtime-config-wal.html#RUNTIME-CONFIG-WAL-RECOVERY-TARGET
WAL 日志
http://postgres.cn/docs/13/wal.html
基于时间点的恢复 官方文档
http://postgres.cn/docs/13/continuous-archiving.html