PG14归档失败解决办法archiver failed on wal_lsn

news2025/2/25 18:16:29

问题描述

昨晚Repmgr+PG14主备主库因wal日志撑爆磁盘,删除主库过期wal文件重做备库后上午进行主备状态巡查,主库向备库发送wal文件正常,但是查主库状态时发现显示有1条归档失败的记录。
postgres: archiver failed on 000000010000006F00000086

  • 主库:

walsender repmgr 172.28.32.23(36122) streaming 72/1BAC3A10" walsender正常
archiver failed on 000000010000006F00000086" 归档失败

  • 备库:

walreceiver streaming 77/9EB6A198" “” “” " walreceiver正常

--查主库数据库状态
[root@pgmaster ~]# systemctl status postgres
● postgres.service - PostgreSQL database server
Loaded: loaded (/usr/lib/systemd/system/postgres.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2023-10-12 22:04:08 CST; 13h ago
Process: 3710968 ExecStart=/server/data/pgdb/pgsql/bin/pg_ctl start -D $PGDATA (code=exited, status=0/SUCCESS)
Main PID: 3710970 (postgres)
Tasks: 53 (limit: 201967)
Memory: 19.0G
CGroup: /system.slice/postgres.service
├─ 3710970 /server/data/pgdb/pgsql/bin/postgres -D /server/data/pgdb/data
├─ 3710971 "postgres: logger " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710992 "postgres: checkpointer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710993 "postgres: background writer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710994 "postgres: walwriter " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710995 "postgres: archiver failed on 000000010000006F00000086" "" "" "" "" "" "" "" "" ""
├─ 3710996 "postgres: logical replication launcher " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3711001 "postgres: top_portal top_portal 172.28.32.18(41438) idle" "" "" "" "" "" ""
├─ 3711003 "postgres: tj_sjjh dataexchange 172.28.32.28(35406) idle" "" "" "" "" "" "" ""
├─ 3711009 "postgres: repmgr repmgr 172.28.32.22(64096) idle" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3711468 "postgres: top_portal top_portal 172.28.32.18(41720) idle" "" "" "" "" "" ""
├─ 3713807 "postgres: top_portal top_portal 172.28.32.20(44492) idle" "" "" "" "" "" ""
├─ 3723017 "postgres: walsender repmgr 172.28.32.23(36122) streaming 72/1BAC3A10"  #wal 发送正常

--查备库状态
[root@pgslave ~]# systemctl status postgres
● postgres.service - PostgreSQL database server
Loaded: loaded (/usr/lib/systemd/system/postgres.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2023-10-13 00:12:19 CST; 12h ago
Process: 1931221 ExecStart=/server/data/pgdb/pgsql/bin/pg_ctl start -D $PGDATA (code=exited, status=0/SUCCESS)
Main PID: 1931223 (postgres)
Tasks: 7 (limit: 201967)
Memory: 23.2G
CGroup: /system.slice/postgres.service
├─ 1931223 /server/data/pgdb/pgsql/bin/postgres -D /server/data/pgdb/data
├─ 1931224 "postgres: logger " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 1931225 "postgres: startup recovering 00000001000000770000009E" "" "" "" "" "" "" "" "" ""
├─ 1931226 "postgres: checkpointer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 1931227 "postgres: background writer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 1931230 "postgres: walreceiver streaming 77/9EB6A198" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""   #wal接收
└─ 1931430 "postgres: repmgr repmgr 172.28.32.23(22956) idle" "" "" "" "" "" "" "" "" "" "" "" "" "" ""

Oct 13 00:12:17 pgslave systemd[1]: Starting PostgreSQL database server...
Oct 13 00:12:17 pgslave pg_ctl[1931221]: waiting for server to start....
Oct 13 00:12:17 pgslave pg_ctl[1931223]: 2023-10-13 00:12:17.497 CST [1931223] LOG:  redirecting log output to logging collector process
Oct 13 00:12:17 pgslave pg_ctl[1931223]: 2023-10-13 00:12:17.497 CST [1931223] HINT:  Future log output will appear in directory "log".
Oct 13 00:12:19 pgslave pg_ctl[1931221]: . done
Oct 13 00:12:19 pgslave pg_ctl[1931221]: server started
Oct 13 00:12:19 pgslave systemd[1]: Started PostgreSQL database server.

问题分析

1.查看数据库日志

在这里插入图片描述

2.查看归档配置参数

参数配置正确,归档目录权限也正确

postgres=# show archive_command;
                      archive_command                      
-----------------------------------------------------------
 /usr/bin/lz4 -q -z %p /server/data/pgdb/pg_archive/%f.lz4
(1 row)

postgres=# show archive_mode;
 archive_mode 
--------------
 on
(1 row)

--查看归档目录的权限
[postgres@pgmaster ~]$ ls -ld /server/data/pgdb/pg_archive
drwxr-x--- 2 postgres postgres 4214784 Oct 13 13:14 /server/data/pgdb/pg_archive

3.手动切日志

手工归档成功,但是未解决,查看状态依然时卡住归档失败的那条wal记录那里

--手工归档
top_portal=# select pg_switch_wal();
 pg_switch_wal 
---------------
 72/51C4CFD8
(1 row)

--查主库数据库状态
[root@pgmaster ~]# systemctl status postgres
● postgres.service - PostgreSQL database server
Loaded: loaded (/usr/lib/systemd/system/postgres.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2023-10-12 22:04:08 CST; 13h ago
Process: 3710968 ExecStart=/server/data/pgdb/pgsql/bin/pg_ctl start -D $PGDATA (code=exited, status=0/SUCCESS)
Main PID: 3710970 (postgres)
Tasks: 53 (limit: 201967)
Memory: 19.0G
CGroup: /system.slice/postgres.service
├─ 3710970 /server/data/pgdb/pgsql/bin/postgres -D /server/data/pgdb/data
├─ 3710971 "postgres: logger " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710992 "postgres: checkpointer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710993 "postgres: background writer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710994 "postgres: walwriter " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710995 "postgres: archiver failed on 000000010000006F00000086" "" "" "" "" "" "" "" "" ""
├─ 3710996 "postgres: logical replication launcher " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3711001 "postgres: top_portal top_portal 172.28.32.18(41438) idle" "" "" "" "" "" ""
├─ 3711003 "postgres: tj_sjjh dataexchange 172.28.32.28(35406) idle" "" "" "" "" "" "" ""
├─ 3711009 "postgres: repmgr repmgr 172.28.32.22(64096) idle" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3711468 "postgres: top_portal top_portal 172.28.32.18(41720) idle" "" "" "" "" "" ""
├─ 3713807 "postgres: top_portal top_portal 172.28.32.20(44492) idle" "" "" "" "" "" ""
├─ 3723017 "postgres: walsender repmgr 172.28.32.23(36122) streaming 72/1BAC3A10"  #wal 发送正常


--查当前wal_lsn
top_portal=# select pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 72/52638F10
(1 row)

--查当前wal_lsn对应的wal文件
top_portal=# select pg_walfile_name(pg_current_wal_lsn());
     pg_walfile_name      
--------------------------
 000000010000007200000052
(1 row)

--查当前最新检查点,最新检查点之前的wal文件均可以删除
[postgres@pgmaster ~]$ pg_controldata $PGDATA
pg_control version number:            1300
Catalog version number:               202107181
Database system identifier:           7268852449124462799
Database cluster state:               in production
pg_control last modified:             Fri 13 Oct 2023 10:07:35 AM CST  
Latest checkpoint location:           71/CDD2FF28
Latest checkpoint's REDO location:    71/CDD28F18
Latest checkpoint's REDO WAL file:    0000000100000071000000CD

--查报错中的wal文件
[postgres@pgmaster pg_wal]$ ls -l 000000010000006F00000086
-rw------- 1 postgres postgres 16777216 Oct 12 21:12 000000010000006F00000086
[postgres@pgmaster pg_wal]$ find /server/data/pgdb/pg_archive -name 000000010000006F00000086*
ls: cannot access '000000010000006F00000086': No such file or directory
[postgres@pgmaster pg_wal]$ find /server -name 000000010000006F00000086*
-rw------- 1 postgres postgres 16777216 Oct 12 21:12 000000010000006F00000086

4.检查$PGDATA/pg_wal/archive_status/目录下文件

[postgres@pgmaster ~]$ cd /server/data/pgdb/data/pg_wal/archive_status/
[postgres@pgmaster archive_status]$ ls -l *.ready
ls: cannot access '*.ready': No such file or directory

说明不存在需要归档但没归档的文件

该目录下,ready说明是需要归档但是没归档的,done是归档完成了的

解决办法

1.将归档失败的wal文件备份到/home/postgres目录下(生产环境如果磁盘空间允许切记不要rm删除,mv备份到目标位置)
2.手工归档select pg_switch_wal();
3.再次查看主备库状态

--1.将归档失败的wal文件备份到/home/postgres目录下
[postgres@pgmaster pg_wal]$ mv 000000010000006F00000086 /home/postgres/000000010000006F00000086
[postgres@pgmaster pg_wal]$ ls -l /home/postgres/000000010000006F00000086
-rw------- 1 postgres postgres 16777216 Oct 12 21:12 /home/postgres/000000010000006F00000086

--2.手工归档
postgres=# select pg_switch_wal();
 pg_switch_wal 
---------------
 73/7EF502E0
(1 row)

--3.再次查看主库状态显示正常
[root@pgmaster data]# systemctl status postgres
● postgres.service - PostgreSQL database server
     Loaded: loaded (/usr/lib/systemd/system/postgres.service; enabled; vendor preset: disabled)
     Active: active (running) since Thu 2023-10-12 22:04:08 CST; 13h ago
    Process: 3710968 ExecStart=/server/data/pgdb/pgsql/bin/pg_ctl start -D $PGDATA (code=exited, status=0/SUCCESS)
   Main PID: 3710970 (postgres)
      Tasks: 50 (limit: 201967)
     Memory: 26.6G
     CGroup: /system.slice/postgres.service
             ├─ 3710970 /server/data/pgdb/pgsql/bin/postgres -D /server/data/pgdb/data
             ├─ 3710971 "postgres: logger " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─ 3710992 "postgres: checkpointer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─ 3710993 "postgres: background writer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─ 3710994 "postgres: walwriter " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─ 3710995 "postgres: archiver archiving 000000010000007100000035" "" "" "" "" "" "" "" "" ""
             ├─ 3710996 "postgres: logical replication launcher " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─ 3711001 "postgres: top_portal top_portal 172.28.32.18(41438) idle" "" "" "" "" "" ""
             ├─ 3711003 "postgres: tj_sjjh dataexchange 172.28.32.28(35406) idle" "" "" "" "" "" "" ""
             ├─ 3711009 "postgres: repmgr repmgr 172.28.32.22(64096) idle" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─ 3711468 "postgres: top_portal top_portal 172.28.32.18(41720) idle" "" "" "" "" "" ""
             ├─ 3713807 "postgres: top_portal top_portal 172.28.32.20(44492) idle" "" "" "" "" "" ""
             ├─ 3723017 "postgres: walsender repmgr 172.28.32.23(36122) streaming 73/7F000BD0"

补充:若$PGDATA/pg_wal/archive_status/目录下存在大量的*.ready文件

可能的原因分析:如果数据库是突然断电,那么可能arvchive命令没有完全完成,归档目录会存在不完整的文件名称,重启数据库后,会出现归档失败的情况,这个时候,需要去归档目录删除相关归档失败文件,那么归档就会重新归档。
还未遇到该场景的问题,暂未实验。
参考链接

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1087794.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

spring自动装配byType和@Autowired依赖注入源码分析(史上最详细的Spring源码分析系列一)

写在前面:阅读spring源码需要读者对Java反射和动态代理有一定了解。关于这部分内容,可以参考这篇博客:Spring源码分析准备工作及java知识补充 一、Spring依赖注入的方式 关于spring的依赖注入,可以参考官方文档:Spring…

分享一下公众号怎么添加在线挂号功能

公众号添加在线挂号功能 一、准备阶段 在开始之前,你需要了解公众号在线挂号的基本流程和需求。在线挂号需要实现以下功能:患者可以通过公众号预约挂号、填写个人信息、选择医生和时间等。在公众号上添加在线挂号功能,需要结合医疗资源、时…

android 与 flutter 之间的通信

文章目录 前言集成 flutter 混合开发android 与 flutter 之间的通信总结 一、前言 因为flutter 具有跨平台的属性,既可以在android上跑,也能在ios 上跑,所以为了节约开发的成本,减少人力,势必就会用到它。然而已有的…

跨语言深入探讨如何实现方法增强:Java Go的多策略实现

🌷🍁 博主猫头虎 带您 Go to New World.✨🍁 🦄 博客首页——猫头虎的博客🎐 🐳《面试题大全专栏》 文章图文并茂🦕生动形象🦖简单易学!欢迎大家来踩踩~🌺 &a…

YB4014是可以对单节磷酸铁锂电池进行恒流/恒压充电管理的集成电路。

概述: YB4014是可以对单节磷酸铁锂电池进行恒流/恒 压充电管理的集成电路。该器件内部包括功率晶 体管,不需要外部的电流检测电阻和阻流二极管 YB4014只需要极少的外围元器件,非常适合于 便携式应用的领域。热调制电路可以在器件的功 耗比较大…

基于SSM的在线教育平台的设计与实现

末尾获取源码 开发语言:Java Java开发工具:JDK1.8 后端框架:SSM 前端:采用JSP技术开发 数据库:MySQL5.7和Navicat管理工具结合 服务器:Tomcat8.5 开发软件:IDEA / Eclipse 是否Maven项目&#x…

用获取手机号归属地详情,精准高效的API接口服务为标题

获取企业联系人网站API接口是一种非常有用的工具,它可以帮助我们快速获取企业的联系人信息。在这篇博文中,我们将介绍如何使用这个API接口,并讲解其原理和功能。 一、什么是API接口? API是“应用程序编程接口”的缩写&#xff0c…

瑞芯微RK3568核心板在边缘服务器产品中的应用-迅为电子

迅为RK3568核心板在边缘服务器产品中可以发挥关键作用,为边缘计算应用提供高性能的计算和多媒体处理能力。边缘服务器通常用于处理和存储数据,执行本地计算任务,并支持与远程云服务的通信。以下是RK3568核心板在边缘服务器产品中的应用方案&a…

Red Giant Trapcode Suite 红巨星粒子插件

Red Giant Trapcode Suite是一款用于在After Effects中模拟和建模3D粒子和效果的软件,由Red Giant Software公司开发。 该软件包包含11种不同的工具,可以帮助用户模拟火、水、烟、雪等粒子效果,以及创建有机视觉效果和3D元素。它还支持在AE与…

【mysql】 bash: mysql: command not found

在linux 服务器上安装了mysql 也可以正常运行。 但是执行命令,系统提示:bash: mysql: command not found bash:mysql:找不到命令 执行的命令是: mysql -u root -h 127.0.0.1 -p由于系统默认会查找的 /usr/bin/ 中下的命令,如…

Mysql8在Windows上离线安装时忘记root密码

场景 Mysql在Windows上离线安装与配置: Mysql在Windows上离线安装与配置_mysql 离线包 配置 及 自动启动 windows_霸道流氓气质的博客-CSDN博客 基于以上离线安装Msyql后,服务器重新做了系统,但是没有格式化磁盘或者说从 别的服务器将安装…

Jetson Orin NX 开发指南(9): MAVROS 的安装与配置

一、前言 由于 Jetson 系列开发板常作为自主无人机的机载电脑,而无人机硬件平台如 PX4 和 ArduPilot 等通过 MAVLink 进行发布无人机状态和位姿等信息,要实现机载电脑与 MAVLink 的通信,必须借助 Mavros 功能包,因此,…

二叉树的直径

题目链接 二叉树的直径 题目描述 注意点 二叉树的 直径 是指树中任意两个节点之间最长路径的 长度 解答思路 最长路径可能经过也可能不经过根节点 root ,在遍历至任意节点时,需要找到其左右子树对应的路径,两棵子树的路径之和就是经过该…

严格按照1.5到3倍来设置虚拟内存大小是不科学的,最好通过性能监视器

虚拟内存是一种通过使用硬件和软件来实现的存储器管理技术。它使应用程序认为它有一个连续的可用存储空间或地址空间。然而,事实上,虚拟内存通常被划分为几个物理内存片段,其中一些存储在外部磁盘存储器上,可以在需要时用于交换数…

初学者必看,前端 Debugger 调试学习

1.文章简介: 报错和Bug,是贯穿程序员整个编程生涯中,无法回避的问题。而调试,就是帮助程序员定位问题、解决问题的重要手段,因此调试是每个程序员必备技能。 调试本身可分为两个过程: 定位问题 和 解决问题&#xff0…

[开源]多功能、高效率、低代码的前后端一体化、智能化的开发工具

一、开源项目简介 多功能、高效率、低代码的前后端一体化、智能化的开发工具 mdp-sys-ui-web旨在为企业开发管理类的业务系统提供一个模板工程,该模板工程具有高效率、低代码、功能丰富等特点。企业可以在该工程之上,加入更多其它业务功能;也…

VScode Invoke-Expression: 无法将参数绑定到参数“Command”,因为该参数为空字符串

打开vscode时发生错误:Invoke-Expression : 无法将参数绑定到参数“Command”,因为该参数为空字符串。 解决办法:在anaconda prompt base中输入: conda upgrade -n base -c defaults --override-channels conda

【MultiOTP】Docker安裝MultiOTP, 让Windows登入更安全(MFA)

序 在当前数字时代,网络安全成为了一个非常重要的话题。随着越来越多的人和组织依赖于计算机系统来进行工作和存储敏感信息,确保身份验证安全变得至关重要。双因素身份验证(2FA)是一种强大的安全措施,可在传统的用户名…

UDP通信:快速入门

UDP协议通信模型演示 UDP API DatagramPacket:数据包对象(韭菜盘子) public DatagramPacket(byte[] buf, int length, InetAddress address, int port)创建发送端数据包对象 buf:要发送的内容,字节数组 length&…

相似与不同:数字孪生和元宇宙的对比

数字孪生和元宇宙是两个备受瞩目的概念,都在数字领域产生了巨大的影响。它们有一些相似之处,但也存在显著的不同。本文将介绍它们的相同点和不同点,以及它们在不同应用领域的前景。 1. 相同点 虚拟性质: 数字孪生和元宇宙都是虚…