一、 背景
收到一个磁盘空间告警,检查发现是本地备份保留比较多导致的,处理过程倒很简单,手动清理掉旧的备份(已自动备到远端服务器),告警就恢复了。
但是检查备份脚本的时候,发现keep-data-days参数明明只设置了1,为什么本地会出现3份备份(保留了3天的备份)?
pg_rman backup -b ${BACKUP_TYPE} -s -C -Z -P --keep-data-days=1 --keep-arclog-files=…(非完整命令)
查了下官方文档的解释…说了好像没说一样
pg_rman
二、 源码学习
1. 奇怪的第3份备份文件
检查了下其他设置keep-data-days=1的服务器,发现都只有最近2天的备份文件,而之前在处理告警时,备份正在执行中。因此可以推测,pg_rman是在备份完成后才清理掉过期的备份文件。因此在备份期间会有3天的文件,而备完后就只有2天。
要验证这个猜测,可以简单地再执行下备份,也可以从pg_rman备份源码分析。
以下在backup.c文件的do_backup函数,可以看到pgBackupDelete函数调用是在各种备份完成之后,符合前面的结论。
int
do_backup(pgBackupOption bkupopt)
{
parray *backup_list;
parray *files_database;
parray *files_arclog;
parray *files_srvlog;
int ret;
char path[MAXPGPATH];
/* repack the necesary options */
int keep_arclog_files = bkupopt.keep_arclog_files;
int keep_arclog_days = bkupopt.keep_arclog_days;
int keep_srvlog_files = bkupopt.keep_srvlog_files;
int keep_srvlog_days = bkupopt.keep_srvlog_days;
int keep_data_generations = bkupopt.keep_data_generations;
int keep_data_days = bkupopt.keep_data_days;
…
/*
* Signal for backup_cleanup() that there may actually be some cleanup
* for it to do from this point on.
*/
in_backup = true;
/* backup data */
files_database = do_backup_database(backup_list, bkupopt);
/* backup archived WAL */
files_arclog = do_backup_arclog(backup_list);
/* backup serverlog */
files_srvlog = do_backup_srvlog(backup_list);
pgut_atexit_pop(backup_cleanup, NULL);
/* update backup status to DONE */
current.end_time = time(NULL);
current.status = BACKUP_STATUS_DONE;
…
/* Delete old backup files after all backup operation. */
pgBackupDelete(keep_data_generations, keep_data_days);
…
return 0;
}
2. keep-data-days的含义
3份备份的问题解决了,还剩下一个,为什么设置keep-data-days=1会保留2天的备份文件而不是1天?以下在delete.c文件的pgBackupDelete函数
/*
* Delete backups that are older than KEEP_xxx_DAYS, or have more generations
* than KEEP_xxx_GENERATIONS.
*/
void
pgBackupDelete(int keep_generations, int keep_days)
{
int i;
parray *backup_list;
int existed_generations;
bool check_generations;
…
/* determine whether to check based on the given days */
if (keep_days == KEEP_INFINITE)
{
check_days = false;
strncpy(days_str, "INFINITE", lengthof(days_str));
}
else
{
check_days = true;
snprintf(days_str, lengthof(days_str),
"%d", keep_days);
/*
* Calculate the threshold day from given keep_days.
* Any backup taken before this threshold day to be
* a candidate for deletion.
*/
tim = current.start_time - (keep_days * 60 * 60 * 24);
ltm = localtime(&tim);
ltm->tm_hour = 0;
ltm->tm_min = 0;
ltm->tm_sec = 0;
keep_after = mktime(ltm);
time2iso(keep_after_timestamp, lengthof(keep_after_timestamp),
keep_after);
}
…
}
可以看到最重要的一行注释:Calculate the threshold day from given keep_days. Any backup taken before this threshold day to be a candidate for deletion.
而所谓的threshold day是怎么算的 —— tim = current.start_time - (keep_days * 60 * 60 * 24);
以20230809为例,当keep-data-days=1,则threshold day为当前时间减1,即20230808。而在阈值日期之前的备份才是过期的,因此20230808不属于,自然也就不会被删除。而20230807就属于过期的文件,因此在备份完成后,它会被删除。
3. 如何只保留当天的备份
有了上面的分析,其实就很简单了,就是设置keep-data-days=0。threshold day为当前时间减0,即20230809,因此当天之前的备份都是过期的,备份完成后也就会删除20230808的文件。简单测试一把:
pg_rman backup -b ${BACKUP_TYPE} -s -C -Z -P --keep-data-days=0 --keep-arclog-files=…(非完整命令)
符合预期~