背景
前段时间在 Oracle 11gR2 数据库中发现了坏块问题。环境是 64 位 Linux 平台。本文将详细介绍如何使用 DBMS_REPAIR
进行在线修复,当然也可以基于备份和 RMAN 的修复方法这里暂时不做介绍。
发现坏块
1. 从 alert.log
中发现错误
在 alert.log
文件中发现了如下错误信息:
DDE: Problem Key 'ORA 1110' was flood controlled (0x1) (no incident)
ORA-01110: data file 8: '/u01/app/oracle/TESTDB/oradata/data/TESTDB_test_data_03.dbf'
Byte offset to file# 8 block# 570051 is 374890496
Incident 1567129 created, dump file:
2. 从 trace
文件中获取详细信息
在 trace
文件中找到了更详细的错误描述:
/u01/app/oracle/testdb/admin/TESTDB/diag/rdbms/TESTDB/TESTDB/incident/incdir_1567129/TESTDB_ora_5396_i1567129.trc
ORA-01578: ORACLE data block corrupted (file # 8, block # 570051)
ORA-01110: data file 8: '/u01/app/oracle/TESTDB/oradata/data/TESTDB_test_data_03.dbf'
Dump continued from file: /u01/app/oracle/testdb/admin/TESTDB/diag/rdbms/TESTDB/TESTDB/trace/TESTDB_ora_5396.trc
ORA-01578: ORACLE data block corrupted (file # 8, block # 570051)
ORA-01110: data file 8: '/u01/app/oracle/TESTDB/oradata/data/TESTDB_test_data_03.dbf'
Dump for incident 1567129 (ORA 1578)
3. 查询坏块的段类型
尝试查询坏块的段类型,确认是索引还是表段出问题了。查询结果为空:
SQL> select segment_name, tablespace_name, segment_type, block_id, file_id, bytes
from dba_extents
where block_id = 570051 and file_id = 8;
no rows selected
4. 运行日志中的 SQL 语句
运行日志中的 SQL 语句,报错如下:
SELECT xxxxx FROM APP_CONTROL AC, APP_BILL_PROC BL
WHERE APP.DATA_GROUP IS NOT NULL
AND BL.PROCESS_ID = APP.NXT_PGM_NAME
AND APP.FILE_STATUS IN ('RD', 'IU', 'CN')
GROUP BY xxxxxxx
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 8, block # 570051)
ORA-01110: data file 8: '/u01/app/oracle/TESTDB/oradata/data/TESTDB_TEST_DATA_03.dbf'
5. 验证相关表
查询相关表的记录数,没有问题:
SQL> select count(*) from APP_CONTROL;
COUNT(*)
----------
1613
SQL> select count(*) from APP_BILL_PROC;
COUNT(*)
----------
103
6. 再次验证 SQL 语句
再次运行日志中的 SQL 语句,依然报错:
SELECT xxxxx FROM APP_CONTROL AC, APP_BILL_PROC BL
WHERE APP.DATA_GROUP IS NOT NULL
AND BL.PROCESS_ID = APP.NXT_PGM_NAME
AND APP.FILE_STATUS IN ('RD', 'IU', 'CN')
GROUP BY xxxxxxx
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 8, block # 570051)
ORA-01110: data file 8: '/u01/app/oracle/TESTDB/oradata/data/TESTDB_test_data_03.dbf'
使用 DBMS_REPAIR
进行在线修复
1. 创建修复表
通过 DBMS_REPAIR.ADMIN_TABLES
创建修复表 REPAIR_TABLE
:
SQL> BEGIN
2 DBMS_REPAIR.ADMIN_TABLES (
3 TABLE_NAME => 'REPAIR_TABLE',
4 TABLE_TYPE => dbms_repair.repair_table,
5 ACTION => dbms_repair.create_action,
6 TABLESPACE => 'test_DATA');
7 END;
8 /
PL/SQL procedure successfully completed.
2. 描述修复表
描述生成的修复表 REPAIR_TABLE
:
SQL> desc repair_table
Name Null? Type
----------------------------------------- -------- ----------------------------
OBJECT_ID NOT NULL NUMBER
TABLESPACE_ID NOT NULL NUMBER
RELATIVE_FILE_ID NOT NULL NUMBER
BLOCK_ID NOT NULL NUMBER
CORRUPT_TYPE NOT NULL NUMBER
SCHEMA_NAME NOT NULL VARCHAR2(30)
OBJECT_NAME NOT NULL VARCHAR2(30)
BASEOBJECT_NAME VARCHAR2(30)
PARTITION_NAME VARCHAR2(30)
CORRUPT_DESCRIPTION VARCHAR2(2000)
REPAIR_DESCRIPTION VARCHAR2(200)
MARKED_CORRUPT NOT NULL VARCHAR2(10)
CHECK_TIMESTAMP NOT NULL DATE
FIX_TIMESTAMP DATE
REFORMAT_TIMESTAMP DATE
3. 检查坏块
使用 DBMS_REPAIR.CHECK_OBJECT
检查坏块:
SQL> set serveroutput on
SQL> DECLARE
2 num_corrupt INT;
3 BEGIN
4 num_corrupt := 0;
5 DBMS_REPAIR.CHECK_OBJECT (
6 SCHEMA_NAME => 'TSTAPP',
7 OBJECT_NAME => 'APP_CONTROL',
8 REPAIR_TABLE_NAME => 'REPAIR_TABLE',
9 corrupt_count => num_corrupt);
10 DBMS_OUTPUT.PUT_LINE('number corrupt: ' || TO_CHAR (num_corrupt));
11 END;
12 /
number corrupt: 1
PL/SQL procedure successfully completed.
4. 查询坏块记录
查询生成的坏块表,确认坏块位置:
SQL> select BLOCK_ID, CORRUPT_TYPE, CORRUPT_DESCRIPTION
2 from REPAIR_TABLE;
BLOCK_ID CORRUPT_TYPE
---------- ------------
CORRUPT_DESCRIPTION
--------------------------------------------------------------------------------
570051 6148
5. 修复坏块
使用 DBMS_REPAIR.FIX_CORRUPT_BLOCKS
修复坏块:
SQL> DECLARE
2 num_fix INT;
3 BEGIN
4 num_fix := 0;
5 DBMS_REPAIR.FIX_CORRUPT_BLOCKS (
6 SCHEMA_NAME => 'TSTAPP',
7 OBJECT_NAME => 'APP_CONTROL',
8 OBJECT_TYPE => dbms_repair.table_object,
9 REPAIR_TABLE_NAME => 'REPAIR_TABLE',
10 FIX_COUNT => num_fix);
11 DBMS_OUTPUT.PUT_LINE('num fix: ' || to_char(num_fix));
12 END;
13 /
num fix: 0
PL/SQL procedure successfully completed.
6. 跳过坏块
使用 DBMS_REPAIR.SKIP_CORRUPT_BLOCKS
跳过坏块:
SQL> BEGIN
2 DBMS_REPAIR.SKIP_CORRUPT_BLOCKS (
3 SCHEMA_NAME => 'TSTAPP',
4 OBJECT_NAME => 'APP_CONTROL',
5 OBJECT_TYPE => dbms_repair.table_object,
6 FLAGS => dbms_repair.SKIP_FLAG);
7 END;
8 /
PL/SQL procedure successfully completed.
7. 再次运行 SQL 语句
再次运行之前的 SQL 语句,验证修复效果:
SQL> SELECT xxxxx FROM APP_CONTROL AC, APP_BILL_PROC BL
2 WHERE APP.DATA_GROUP IS NOT NULL
3 AND BL.PROCESS_ID = APP.NXT_PGM_NAME
4 AND APP.FILE_STATUS IN ('RD', 'IU', 'CN')
5 GROUP BY xxxxxxx;
-- 成功执行,不再报错
结论
通过使用 DBMS_REPAIR
,我们成功地在线修复了 Oracle 11gR2 数据库中的坏块问题。这种方法适用于需要快速恢复业务的情况。此外,还可以通过备份和 RMAN 进行更彻底的修复,具体方法取决于数据库的归档模式(ARCHIVELOG 或 NOARCHIVELOG)后面有机会再总结。