早上接到用户电话,出现有表空间不足的告警,事实上此环境经常巡检并且有告警系统,一开始就带着有所疑惑的心理,结果同事在扩大表空间时,遇到报错 ORA-15401/ORA-17505,提示ASM空间满了:
ALERT日志:
Sat Mar 22 09:00:51 2025
LNS: Standby redo logfile selected for thread 1 sequence 96259 for destination LOG_ARCHIVE_DEST_4
Sat Mar 22 09:00:51 2025
Archived Log entry 310787 added for thread 1 sequence 96258 ID 0x730f3701 dest 1:
Sat Mar 22 09:00:53 2025
ORA-1654: unable to extend index MEDIHIS.IDX_YB_CX_RYLJXX_XH by 128 in tablespace TS_MEDIHIS
ORA-1654: unable to extend index MEDIHIS.IDX_YB_CX_RYLJXX_XH by 8192 in tablespace TS_MEDIHIS
ORA-1654: unable to extend index MEDIHIS.IDX_YB_CX_RYLJXX_XH by 128 in tablespace TS_MEDIHIS
ASM日志中告警:
Sat Mar 22 09:22:12 2025
ERROR: ORA-15041 thrown in ARB0 for group number 2
Errors in file /oracle/base/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_359011.trc:
ORA-15041: diskgroup "DATA2" space exhausted
Sat Mar 22 09:22:12 2025
NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 2/0x76a85195 (DATA2)
检查ASM磁盘使用情况,发现有异常:
NAME GROUP_NUMBER HEADER_STATU PATH FAILGROUP
-------------------- ------------ ------------ ---------------------------------------- ------------------------------
0 FORMER /dev/mapper/testk1_crs02 ==忽略
0 FORMER /dev/mapper/testk2_crs01 ==忽略
0 MEMBER /dev/mapper/testk2_crs02
0 MEMBER /dev/mapper/testk2_data02
0 MEMBER /dev/mapper/testk2_data04
0 MEMBER /dev/mapper/testk2_data07
0 MEMBER /dev/mapper/testk2_data09
0 MEMBER /dev/mapper/testk2_data11
0 MEMBER /dev/mapper/testk2_data13
ARCH_0002 1 MEMBER /dev/mapper/testk1_data05 ARCH_0002
ARCH_0001 1 MEMBER /dev/mapper/testk2_data05 K2
DATA2_0004 2 MEMBER /dev/mapper/testk1_data06 L1
DATA2_0005 2 MEMBER /dev/mapper/testk1_data07 L1
DATA2_0006 2 MEMBER /dev/mapper/testk1_data08 L1
DATA2_0007 2 MEMBER /dev/mapper/testk1_data09 L1
DATA2_0013 2 MEMBER /dev/mapper/testk1_data10 L1
DATA2_0014 2 MEMBER /dev/mapper/testk1_data11 L1
DATA2_0015 2 MEMBER /dev/mapper/testk1_data12 L1
DATA2_0016 2 MEMBER /dev/mapper/testk1_data13 L1
DATA2_0017 2 MEMBER /dev/mapper/testk1_data14 L1
DATA2_0008 2 MEMBER /dev/mapper/testk2_data06 L2
DATA2_0010 2 MEMBER /dev/mapper/testk2_data08 L2
DATA2_0012 2 MEMBER /dev/mapper/testk2_data10 L2
DATA2_0001 2 MEMBER /dev/mapper/testk2_data12 L2
DATA2_0003 2 MEMBER /dev/mapper/testk2_data14 L2
_DROPPED_0002_DATA2 2 UNKNOWN L2
_DROPPED_0000_DATA2 2 UNKNOWN L2
_DROPPED_0011_DATA2 2 UNKNOWN L2
_DROPPED_0009_DATA2 2 UNKNOWN L2
DATA_0003 3 MEMBER /dev/mapper/testk1_data01 K1
DATA_0000 3 MEMBER /dev/mapper/testk1_data02 K1
DATA_0001 3 MEMBER /dev/mapper/testk1_data03 K1
DATA_0002 3 MEMBER /dev/mapper/testk1_data04 K1
DATA_0004 3 MEMBER /dev/mapper/testk2_data01 K2
DATA_0006 3 MEMBER /dev/mapper/testk2_data03 K2
_DROPPED_0007_DATA 3 UNKNOWN K2
_DROPPED_0005_DATA 3 UNKNOWN K2
OCR_0000 4 MEMBER /dev/mapper/testk1_crs01 OCR_0000
OCR_0002 4 MEMBER /ocrvote3/ocr/ocrvote3 OCRSERVER3
OCR_0003 4 UNKNOWN OCR_0003
40 rows selected.
对比日志和ASM磁盘状态,可以发现是出现了一些磁盘OFFLINE,结合ASM日志以及操作系统的MESSAGE日志,可以确定是存储当时异常,导致了ASM磁盘被OFFLINE/DROP。
存储检查和修复问题后,如何将这些磁盘再加回来呢?
参考命令和步骤如下:
1、在disk_repair_time时间范围,磁盘状态是OFFLINE NORMAL
OCR_0000 4 MEMBER ONLINE NORMAL /dev/mapper/sc7k1_crs01 5
OCR_0002 4 MEMBER ONLINE NORMAL /ocrvote3/ocr/ocrvote3 1
OCR_0003 4 UNKNOWN OFFLINE NORMAL 5
直接一条命令加回去即可:
SQL> ALTER DISKGROUP OCR ONLINE DISKS IN FAILGROUP OCR_0003;
Diskgroup altered.
SQL> select NAME,GROUP_NUMBER,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,STATE,PATH,TOTAL_MB from V$ASM_DISK where name like 'OCR%' order by 2,1;
NAME GROUP_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE PATH TOTAL_MB
-------------------- ------------ ------- ------------ ------- -------- ------------------------------ ----------
OCR_0000 4 CACHED MEMBER ONLINE NORMAL /dev/mapper/sc7k1_crs01 5120
OCR_0002 4 CACHED MEMBER ONLINE NORMAL /ocrvote3/ocr/ocrvote3 1024
OCR_0003 4 CACHED MEMBER ONLINE NORMAL /dev/mapper/sc7k2_crs02 5120
2、超过disk_repair_time时间范围,磁盘状态是_DROPPED_0005_DATA 3 UNKNOWN OFFLINE FORCING
_DROPPED_0005_DATA 3 UNKNOWN OFFLINE FORCING 1024
_DROPPED_0007_DATA 3 UNKNOWN OFFLINE FORCING 1024
直接加回去报错:
SQL> ALTER DISKGROUP DATA ONLINE DISKS IN FAILGROUP K2;
ALTER DISKGROUP DATA ONLINE DISKS IN FAILGROUP K2
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15281: not all specified disks were brought ONLINE
ORA-15284: ASM terminated ALTER DISKGROUP ONLINE
需要使用FORCE参数:
SQL> ALTER DISKGROUP DATA ADD FAILGROUP K2 DISK '/dev/mapper/sc7k2_data02' FORCE,'/dev/mapper/sc7k2_data04' FORCE REBALANCE POWER 8;
Diskgroup altered.
加回去后检查:
SQL> select NAME,GROUP_NUMBER,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,STATE,PATH,TOTAL_MB from V$ASM_DISK order by 2,1;
NAME GROUP_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE PATH TOTAL_MB
-------------------- ------------ ------- ------------ ------- -------- ------------------------------ ----------
0 CLOSED MEMBER ONLINE NORMAL /dev/mapper/sc7k2_data13 0
0 CLOSED MEMBER ONLINE NORMAL /dev/mapper/sc7k2_data11 0
0 CLOSED MEMBER ONLINE NORMAL /dev/mapper/sc7k2_data09 0
0 CLOSED MEMBER ONLINE NORMAL /dev/mapper/sc7k2_data07 0
0 CLOSED FORMER ONLINE NORMAL /dev/mapper/sc7k2_crs01 0
0 CLOSED FORMER ONLINE NORMAL /dev/mapper/sc7k1_crs02 0
ARCH_0001 1 CACHED MEMBER ONLINE NORMAL /dev/mapper/sc7k2_data05 1048576
ARCH_0002 1 CACHED MEMBER ONLINE NORMAL /dev/mapper/sc7k1_data05 1048576
DATA_0000 3 CACHED MEMBER ONLINE NORMAL /dev/mapper/sc7k1_data02 1048576
DATA_0001 3 CACHED MEMBER ONLINE NORMAL /dev/mapper/sc7k1_data03 1048576
DATA_0002 3 CACHED MEMBER ONLINE NORMAL /dev/mapper/sc7k1_data04 1048576
DATA_0003 3 CACHED MEMBER ONLINE NORMAL /dev/mapper/sc7k1_data01 1048576
DATA_0004 3 CACHED MEMBER ONLINE NORMAL /dev/mapper/sc7k2_data01 1048576
DATA_0006 3 CACHED MEMBER ONLINE NORMAL /dev/mapper/sc7k2_data03 1048576
DATA_0008 3 CACHED MEMBER ONLINE NORMAL /dev/mapper/sc7k2_data02 1048576
DATA_0009 3 CACHED MEMBER ONLINE NORMAL /dev/mapper/sc7k2_data04 1048576
_DROPPED_0005_DATA 3 MISSING UNKNOWN OFFLINE FORCING 1048576
_DROPPED_0007_DATA 3 MISSING UNKNOWN OFFLINE FORCING 1048576
OCR_0000 4 CACHED MEMBER ONLINE NORMAL /dev/mapper/sc7k1_crs01 5120
OCR_0002 4 CACHED MEMBER ONLINE NORMAL /ocrvote3/ocr/ocrvote3 1024
OCR_0003 4 CACHED MEMBER ONLINE NORMAL /dev/mapper/sc7k2_crs02 5120
39 rows selected.
后续等待ASM磁盘数据再平衡完成,观察数据库性能正常,即可完成。
SQL> SELECT * FROM gv$asm_operation;
INST_ID GROUP_NUMBER OPERA STATE POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE
---------- ------------ ----- ---------- ---------- ---------- ---------- ---------- ---------- ----------- ------------
2 2 REBAL RUN 8 8 140707 6275009 16777 365
2 3 REBAL WAIT 8
1 2 REBAL WAIT 8
1 3 REBAL RUN 8 8 212238 2202765 18208 109