前言:
在通过备份进行数据恢复时,经常会遇到由于备份的不完整,特别是归档日志的缺失,导致虽然数据文件已经顺利恢复,但是数据文件的恢复时间点没有完全一致或者数据文件的fuzzy的状态不一致,从而数据库无法正常的打开,面对这种情况除了继续追增量日志将数据文件恢复到时间点一致之外,我们还可以通过bbed工具修改数据文件强制打开数据库,这种非常规方式在故障恢复时可以派上用场
测试场景:
在Oracle11.2.0.4通过备份在异地恢复数据文件,恢复出来的数据文件checkpoint_change#并不完全一致,fuzzy的状态也不一致,此时,我们无法将数据库正常的打开,即使是resetlogs也不行
通过open resetlogs打开数据库,至少要满足以下条件
1 每个数据文件的resetlog_change#在相同,即同一个incarnation
2 每个数据文件的checkpoint_change#相同
3 每个数据文件的fuzzy状态相同,并fhsta状态为0
而当前的恢复场景,除了条件1满足之外,其他并不满足,需要通过bbed修改数据文件
bbed测试:
首先配置好bbed的配置文件以确保识别到数据文件
$ cat /tmp/bbed.par
blocksize=8192
listfile=/tmp/datafile.list
mode=edit
$ cat /tmp/datafile.list
1 /u01/app/oracle/oradata/dbocs/datafile/system 2306867200
2 /u01/app/oracle/oradata/dbocs/datafile/sysaux 660602880
3 /u01/app/oracle/oradata/dbocs/datafile/undotbs1 508559360
4 /u01/app/oracle/oradata/dbocs/datafile/users 543948800
5 /u01/app/oracle/oradata/dbocs/datafile/undotbs2 104857600
6 /u01/app/oracle/oradata/dbocs/datafile/test1 209715200
开始修改数据文件头里面的checkpoint_change#,这个在数据文件block 1所在的结构kcvfh.kcvfhckp.kcvcpscn.kscnbas(offset 484)里面,当前值为16进制0x0049cfb9换算成10进制就是4837305
通过bbed修改该值,通过使用assign或者modify命令进行修改,assign的优势在于可以使用10进制进行修改,不用转化成16进制,并且不需要考虑操作系统的字节序,因为对于Linux平台字节序是litter,修改值是需要从小端开始
--将scn改成比原来大,要确保数据文件头的scn大于块里面的scn
--原来是4837305,改成10000000
BBED> assign dba 1,1 kcvfh.kcvfhckp.kcvcpscn.kscnbas = 10000000
--修改之后,要重新计算修改后的数据块的 checksum 值
BBED> sum apply
Check value for File 1, Block 1:
current = 0x1a4c, required = 0x1a4c
--验证数据块是否正常
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/app/oracle/oradata/dbocs/datafile/system
BLOCK = 1
DBVERIFY - Verification complete
Total Blocks Examined : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing (Index): 0
Total Blocks Empty : 0
Total Blocks Marked Corrupt : 0
Total Blocks Influx : 0
Message 531 not found; product=RDBMS; facility=BBED
--其他的数据文件2,3,4,5,6重复以上操作
通过bbed修改之后,checkpoint_change#统一为10000000
select file#,name,CHECKPOINT_CHANGE#,RESETLOGS_CHANGE#,FUZZY,CHECKPOINT_COUNT
from v$datafile_header
FILE# NAME CHECKPOINT_CHANGE# RESETLOGS_CHANGE# FUZ CHECKPOINT_COUNT
---------- ------------------------------------------------------------ ------------------ ----------------- --- ----------------
1 /u01/app/oracle/oradata/dbocs/datafile/system 10000000 925702 NO 286
2 /u01/app/oracle/oradata/dbocs/datafile/sysaux 10000000 925702 NO 286
3 /u01/app/oracle/oradata/dbocs/datafile/undotbs1 10000000 925702 YES 207
4 /u01/app/oracle/oradata/dbocs/datafile/users 10000000 925702 NO 285
5 /u01/app/oracle/oradata/dbocs/datafile/undotbs2 10000000 925702 NO 7
6 /u01/app/oracle/oradata/dbocs/datafile/test1 10000000 925702 YES 7
接下来修改数据文件的fuzzy 状态,fuzzy的状态存储在数据文件block 1的kcvfhsta(offset 138)
--使用另一种方式modify修改,将kcvfhsta的值修改为0x0000
BBED> m /x 0000 file 1 block 1 offset 138
--修改之后,要重新计算修改后的数据块的checksum 值
BBED> sum apply
--验证数据块是否正常
BBED> verify
--其他的数据文件2,3,4,5,6重复以上操作
修改完之后,确认fuzzy状态为0
select HXFNM,fhsta from x$kcvfh
HXFNM FHSTA
------------------------------------------------------------ ----------
/u01/app/oracle/oradata/dbocs/datafile/system 0
/u01/app/oracle/oradata/dbocs/datafile/sysaux 0
/u01/app/oracle/oradata/dbocs/datafile/undotbs1 0
/u01/app/oracle/oradata/dbocs/datafile/users 0
/u01/app/oracle/oradata/dbocs/datafile/undotbs2 0
/u01/app/oracle/oradata/dbocs/datafile/test1 0
select file#,name,CHECKPOINT_CHANGE#,RESETLOGS_CHANGE#,FUZZY,CHECKPOINT_COUNT
from v$datafile_header
FILE# NAME CHECKPOINT_CHANGE# RESETLOGS_CHANGE# FUZ CHECKPOINT_COUNT
---------- ------------------------------------------------------------ ------------------ ----------------- --- ----------------
1 /u01/app/oracle/oradata/dbocs/datafile/system 10000000 925702 NO 286
2 /u01/app/oracle/oradata/dbocs/datafile/sysaux 10000000 925702 NO 286
3 /u01/app/oracle/oradata/dbocs/datafile/undotbs1 10000000 925702 NO 207
4 /u01/app/oracle/oradata/dbocs/datafile/users 10000000 925702 NO 285
5 /u01/app/oracle/oradata/dbocs/datafile/undotbs2 10000000 925702 NO 7
6 /u01/app/oracle/oradata/dbocs/datafile/test1 10000000 925702 NO 7
checkpoint_change#以及fuzzy状态修改完之后,已经满足了打开数据库的条件,接下来需要重建一下控制文件
--重建控制文件
STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "DBOCS" RESETLOGS ARCHIVELOG
MAXLOGFILES 192
MAXLOGMEMBERS 3
MAXDATAFILES 1024
MAXINSTANCES 32
MAXLOGHISTORY 633
LOGFILE
GROUP 11 '/u01/app/oracle/oradata/dbocs/onlinelog/group_11.256.1036344439' SIZE 500M BLOCKSIZE 512,
GROUP 14 '/u01/app/oracle/oradata/dbocs/onlinelog/group_14.291.1036352287' SIZE 500M BLOCKSIZE 512
-- STANDBY LOGFILE
DATAFILE
'/u01/app/oracle/oradata/dbocs/datafile/system',
'/u01/app/oracle/oradata/dbocs/datafile/sysaux',
'/u01/app/oracle/oradata/dbocs/datafile/undotbs1',
'/u01/app/oracle/oradata/dbocs/datafile/users',
'/u01/app/oracle/oradata/dbocs/datafile/undotbs2',
'/u01/app/oracle/oradata/dbocs/datafile/test1'
CHARACTER SET ZHS16GBK
;
使用open resetlogs打开数据库
--recover database
RECOVER DATABASE USING BACKUP CONTROLFILE;
--open database
ALTER DATABASE OPEN RESETLOGS;
--添加临时文件
ALTER TABLESPACE TEMP ADD TEMPFILE '/u01/app/oracle/oradata/dbocs/tempfile/temp.257.1036073905' size 100M;
打开数据库之后,大概率会出现ORA-00600 [4194]的报错,因为在添加一条新的undo记录的时候,会去比较undo块里面的最大记录数跟redo日志里面的undo最大记录数,如果出现不一致,就会触发ORA-00600[4194]的错误
Block recovery completed at rba 1.73.16, scn 0.10000067
Errors in file /u01/app/oracle/diag/rdbms/dbocs/dbocs/trace/dbocs_mmon_40326.trc (incident=14524):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/dbocs/dbocs/incident/incdir_14524/dbocs_mmon_40326_i14524.trc
Arg [a] - Maximum Undo record number in Undo block
Arg [b] - Undo record number from Redo block
Since we are adding a new undo record to our undo block, we would expect that the new record number is equal to the maximum record number in the undo block plus one.
Before Oracle can add a new undo record to the undo block it validates that this is correct.
If this validation fails, then an ORA-600 [4194] will be triggered.
数据库打开之后才出现的ORA-00600 [4194]的报错,一般都是undo表空间里面的回滚段问题,不是system的回滚段问题,可以通过重建undo表空间解决
--关闭自动undo段管理
alter system set undo_management = manual scope=spfile;
--10513禁用事务的回滚
alter system set event = '10513 trace name context forever, level 2' scope=spfile;
--重启数据库,以严格模式启动
shutdown immediate
startup restrict
--检查非offline的回滚段,system的回滚段一直都是online的可以不用管,需要确保其他回滚段都是offline,不能出现'PARTLY AVAILABLE' or 'NEEDS RECOVERY'
SQL> select tablespace_name, status, segment_name from dba_rollback_segs where status != 'OFFLINE';
TABLESPACE_NAME STATUS SEGMENT_NAME
------------------------------ ---------------- ------------------------------
SYSTEM ONLINE SYSTEM
--重建undo表空间
SQL> create undo tablespace undo1 datafile size 100M;
SQL> drop tablespace undotbs1 including contents and datafiles;
--设置默认表空间为新的undo表空间以及开启自动undo段管理,删除event参数
alter system set undo_tablespace=undo1 scope=spfile;
alter system set undo_management =auto scope=spfile;
alter system reset event scope=spfile;
--重启
shutdown immediate;
startup