文章目录
- 1.问题描叙
- 2 查因过程
- 2.1 查看磁盘空间状况
- 2.2 检查表空间增长情况
- 2.3 检查$ORACLE_BASE相关文件
- 2.4查看trace file内容
- 3.处理过程
- 3.1数据文件检查
- 3.2 表及索引段检查
- 4.结论:
1.问题描叙
收到check_mk告警,约在15:17左右硬盘空间消耗暴增
2 查因过程
2.1 查看磁盘空间状况
u1很快被耗尽
[oracle@gsierp ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 45G 16G 27G 38% /
/dev/sda1 2.0G 52M 1.8G 3% /boot
tmpfs 16G 16K 16G 1% /dev/shm
/dev/sdd 99G 78G 16G 84% /u3
/dev/sde 99G 30G 64G 33% /u4
/dev/sdb1 197G 196G 114M 100% /u1
/dev/sdc1 74G 68G 2.6G 97% /u2
2.2 检查表空间增长情况
从输出结果来看,并没有异常增长,看来问题不在数据库数据文件
select a.snap_id,a.dbid,a.tablespace_id,a.tablespace_size max_tablespace_size,b.tablespace_size min_tablespace_size,
a.tablespace_usedsize max_tablespace_usedsize,b.tablespace_usedsize min_tablespace_usedsize,
round(a.tablespace_size-b.tablespace_size,2) diff_tablespace_size,
round((a.tablespace_usedsize-b.tablespace_usedsize)*8/1024/1024,2) diff_tablespace_usedsize
from
(select su.snap_id,su.dbid,su.tablespace_id,su.tablespace_size,su.tablespace_usedsize from dba_hist_tbspc_space_usage su
join dba_hist_snapshot sn on su.snap_id=sn.snap_id
join (select MAX(SNAP_ID) MAX_SNAP_ID,MIN(SNAP_ID) MIN_SNAP_ID from dba_hist_snapshot where to_char(begin_interval_time,'YYYY-MM-DD HH24:MI:SS') LIKE '2023-08-29%') msn
on su.snap_id=msn.max_snap_id) a,
(select su.snap_id,su.dbid,su.tablespace_id,su.tablespace_size,su.tablespace_usedsize from dba_hist_tbspc_space_usage su
join dba_hist_snapshot sn on su.snap_id=sn.snap_id
join (select MAX(SNAP_ID) MAX_SNAP_ID,MIN(SNAP_ID) MIN_SNAP_ID from dba_hist_snapshot where to_char(begin_interval_time,'YYYY-MM-DD HH24:MI:SS') LIKE '2023-08-29%') msn
on su.snap_id=msn.min_snap_id) b
where a.dbid=a.dbid and a.tablespace_id=b.tablespace_id
SNAP_ID DBID TABLESPACE_ID MAX_TABLESPACE_SIZE MIN_TABLESPACE_SIZE MAX_TABLESPACE_USEDSIZE MIN_TABLESPACE_USEDSIZE DIFF_TABLESPACE_SIZE DIFF_TABLESPACE_USEDSIZE
---------- ---------- ------------- ------------------- ------------------- ----------------------- ----------------------- -------------------- ------------------------
76920 2922139705 0 92800 92800 92344 92344 0 0
76920 2922139705 1 4971008 4971008 67408 55632 0 .09
76920 2922139705 2 179200 179200 160072 160560 0 0
76920 2922139705 4 145440 145440 144776 144776 0 0
76920 2922139705 5 11308928 11308928 10512584 10512584 0 0
76920 2922139705 6 1876224 1876224 1544152 1544152 0 0
76920 2922139705 8 3055360 3055360 2682048 2682048 0 0
76920 2922139705 9 1462016 1462016 1179416 1179416 0 0
76920 2922139705 11 12390896 12390896 12068792 12022072 0 .36
76920 2922139705 12 1420800 1420800 976688 976688 0 0
76920 2922139705 14 2706560 2706560 2659040 2659040 0 0
SNAP_ID DBID TABLESPACE_ID MAX_TABLESPACE_SIZE MIN_TABLESPACE_SIZE MAX_TABLESPACE_USEDSIZE MIN_TABLESPACE_USEDSIZE DIFF_TABLESPACE_SIZE DIFF_TABLESPACE_USEDSIZE
---------- ---------- ------------- ------------------- ------------------- ----------------------- ----------------------- -------------------- ------------------------
76920 2922139705 15 749440 749440 735440 735440 0 0
76920 2922139705 16 1280 1280 232 232 0 0
76920 2922139705 17 1310720 1310720 0 0 0 0
2.3 检查$ORACLE_BASE相关文件
最终检查到udump目录的大小达27G之多
[oracle@gsierp pyia]$ du -sh $ORACLE_BASE/admin/pyia/*
63M ./adump
20G ./adump.old
364M ./bdump
24K ./cdump
27G ./udump
[oracle@gsierp udump]$ ls -alh --full-time|grep "2023-08-29"
-rw-r----- 1 oracle oinstall 25G 2023-08-29 15:49:52.771222574 +0700 pyia_ora_11160.trc
2.4查看trace file内容
由于文件size太大,先使用head命令查看头部信息
[oracle@gsierp udump]$ head -n 100 pyia_ora_11160.trc |more
/u1/admin/pyia/udump/pyia_ora_11160.trc
Oracle Database 10g Release 10.2.0.5.0 - 64bit Production
ORACLE_HOME = /oracle/app/product/10.2.0
System name: Linux
Node name: gsierp
Release: 2.6.32-300.10.1.el5uek
Version: #1 SMP Wed Feb 22 17:37:40 EST 2012
Machine: x86_64
Instance name: pyia
Redo thread mounted by this instance: 1
Oracle process number: 56
Unix process pid: 11160, image: oracle@gsierp
*** 2023-08-29 15:02:41.341
*** ACTION NAME:() 2023-08-29 15:02:41.341
*** MODULE NAME:(app_sis_sap.exe) 2023-08-29 15:02:41.341
*** SERVICE NAME:(SYS$USERS) 2023-08-29 15:02:41.341
*** SESSION ID:(920.29479) 2023-08-29 15:02:41.341
row 088e32c5.1 continuation at
file# 34 block# 930501 slot 2 not found
**************************************************
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 2 ..... nrows: 15
**************************************************
*** 2023-08-29 15:02:41.341
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Current SQL statement for this session:
AND TRIM(NVL(TRIM(A.POSNR), TRIM(SIZE_NO))) = TRIM(DECODE(TRIM(B.POSNR),'000000', B.WO_SIZE
FROM SAP_WO A,BARUM(B.QTY) QTY,03', 'ZP04', 'ZP10', 'ZP14')) = 'REL'
SELEN ALLA.AUFNRIN ('ZP02', 'ZP03', 'ZP04', 'ZP10', 'ZP14')) = 'REL'
AND NOT A.AUART IN ('ZP02', 'ZP03', 'ZP04', 'ZP10ASTNR,'N')) = 'REL'
AND NOTSTR(A.PLNBEZ, 1, 2) = 'FG' AND TRIM(NVL(A.ASTNR,'N')) = 'REL'
AND SUBSTR(A.PLNBEZ, 1, 2) = 'FG' B.PROD_BATCH ='10422479-0010'
/*ERE A.AUFNR = B.AUFNR(+), 'ZP03', 'ZP04', 'ZP10', 'ZP14')
SELECT A.AUFNR,A.WO_SIZE,A.GAMNG,B.QTY BAR_QTY
FROM
(SELECT B.AUFNR, B.WO_SIZE WO_SIZE, SUM(B.GAMNG) GAMNG
FROM ODRD A, SAP_WO B
WHERE A.FACT_ODR_NO = B.PROD_BATCH
AND TRIM(NVL(TRIM(A.POSNR), TRIM(SIZE_NO))) = TRIM(DECODE(TRIM(B.POSNR),'000000', B.WO_SIZE, B.POSNR))
AND B.PROD_BATCH = :AS_PROD_BATCH
AND SUBSTR(B.PLNBEZ, 1, 2) = :AS_KIND AND TRIM(NVL(B.ASTNR,'N')) = 'REL'
AND NOT B.AUART IN ('ZP02', 'ZP03', 'ZP04', 'ZP10', 'ZP14')
GROUP BY B.AUFNR, B.WO_SIZE) A,
(SELECT A.AUFNR,SUM(B.QTY) QTY
FROM SAP_WO A,BARM_1 B
WHERE A.AUFNR = B.AUFNR
AND A.PROD_BATCH = :AS_PROD_BATCH
AND SUBSTR(A.PLNBEZ, 1, 2) = :AS_KIND AND TRIM(NVL(A.ASTNR,'N')) = 'REL'
AND NOT A.AUART IN ('ZP02', 'ZP03', 'ZP04', 'ZP10', 'ZP14')
GROUP BY A.AUFNR
UNION ALL
SELECT A.AUFNR,SUM(B.QTY) QTY
FROM SAP_WO A,BARM_3 B
WHERE A.AUFNR = B.AUFNR
AND A.PROD_BATCH = :AS_PROD_BATCH
AND SUBSTR(A.PLNBEZ, 1, 2) = :AS_KIND AND TRIM(NVL(A.ASTNR,'N')) = 'REL'
AND NOT A.AUART IN ('ZP02', 'ZP03', 'ZP04', 'ZP10', 'ZP14')
GROUP BY A.AUFNR
UNION ALL
SELECT A.AUFNR,SUM(B.QTY) QTY
FROM SAP_WO A,BARM_4 B
WHERE A.AUFNR = B.AUFNR
AND A.PROD_BATCH = :AS_PROD_BATCH
AND SUBSTR(A.PLNBEZ, 1, 2) = :AS_KIND AND TRIM(NVL(A.ASTNR,'N')) = 'REL'
AND NOT A.AUART IN ('ZP02', 'ZP03', 'ZP04', 'ZP10', 'ZP14')
GROUP BY A.AUFNR
UNION ALL
SELECT A.AUFNR,SUM(B.QTY) QTY
FROM SAP_WO A,BARM_5 B
WHERE A.AUFNR = B.AUFNR
AND A.PROD_BATCH = :AS_PROD_BATCH
AND SUBSTR(A.PLNBEZ, 1, 2) = :AS_KIND AND TRIM(NVL(A.ASTNR,'N')) = 'REL'
AND NOT A.AUART IN ('ZP02', 'ZP03', 'ZP04', 'ZP10', 'ZP14')
GROUP BY A.AUFNR) B
WHERE A.AUFNR = B.AUFNR(+)
/*
SELECT B.AUFNR,B.WO_SIZE WO_SIZE,SUM(B.GAMNG) GAMNG,
NVL(SUM((SELECT SUM(QTY) FROM BARM_1 Z WHERE Z.AUFNR=B.AUFNR)),0)+
NVL(SUM((SELECT SUM(QTY) FROM BARM_3 Z WHERE Z.AUFNR=B.AUFNR)),0)+
NVL(SUM((SELECT SUM(QTY) FROM BARM_4 Z WHERE Z.AUFNR=B.AUFNR)),0)+
NVL(SUM((SELECT SUM(QTY) FROM BARM_5 Z WHERE Z.AUFNR=B.AUFNR)),0) BAR_QTY
FROM ODRD A,SAP_WO B
WHERE A.FACT_ODR_NO = B.PROD_BATCH AND
TRIM(NVL(TRIM(A.POSNR),TRIM(SIZE_NO)))=TRIM(DECODE(TRIM(B.POSNR),'000000',B.WO_SIZE,B.POSNR)) AND
B.PROD_BATCH =:AS_PROD_BATCH AND
SUBSTR(B.PLNBEZ,1,2)=:AS_KIND AND
b.AUART NOT IN ('ZP02','ZP03','ZP04','ZP10','ZP14')
GROUP BY B.AUFNR,B.WO_SIZE
*/
从输出看出:
file# 34 block# 930501 slot 2 not found
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
触发错误的是一个select查询语句,由此推断,select在执行过程中,遇到file# 34 block# 930501 slot 2 not found,导致ORA-00600错误
这很可能是一个物理错误或者逻辑错误
3.处理过程
3.1数据文件检查
先对涉及的file# 34作一个dbv检验,看看是否有物理损坏
[oracle@gsierp ~]$ dbv userid='system/"XXX123OSrr07g4KaFLCCs7c/"' file='/u4/oradata/pyia/sisdata03.dbf' blocksize=8192
DBVERIFY: Release 10.2.0.5.0 - Production on Wed Aug 30 15:58:47 2023
Copyright (c) 1982, 2007, Oracle. All rights reserved.
DBVERIFY - Verification starting : FILE = /u4/oradata/pyia/sisdata03.dbf
DBVERIFY - Verification complete
Total Pages Examined : 1937920
Total Pages Processed (Data) : 965136
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 793739
Total Pages Failing (Index): 0
Total Pages Processed (Other): 172036
Total Pages Processed (Seg) : 354
Total Pages Failing (Seg) : 0
Total Pages Empty : 6655
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 335941877 (2475.335941877)
从检查结果来看,数据文件正常,没有发现问题
3.2 表及索引段检查
先定位出file# 34 block# 930501 slot 2 not found所涉及的表
SQL> select * from dba_extents where file_id=34 and 930501 between block_id and block_id+blocks;
OWNER SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE TABLESPACE_NAME EXTENT_ID FILE_ID BLOCK_ID BYTES BLOCKS RELATIVE_FNO
-------------------- ------------------------------ ------------------------------ ------------------ -------------------- ---------- ---------- ---------- ---------- ---------- ------------
PGDSISSAP SAP_WO TABLE SIS_DATA 2 34 929801 41943040 5120 34
分析涉事的表及其索引,结果也正常
SQL> analyze table pgdsissap.sap_wo validate structure cascade;
Table analyzed.
清空buffer_cache后,做一次全表扫描涉事的表,还是正常
ALTER SYSTEM FLUSH BUFFER_CACHE;
select * from pgdsissap.sap_wo
COUNT(*)
----------
298678
4.结论:
最后,结合整理网上网友观点:
坏块可能出现再内存中,经过一段时间,buffer cache中的坏块内容已经被age out(老化),所以已经无法重现(甚至不用做任何操作)