在对某医院HIS数据库环境搜集过程中,发现这套Oracle RAC数据库没有正确使用到multipath提供的多路径磁盘,本着对用户及合作伙伴负责的态度,将过程做一描述说明,以及提出一点解决问题的思路建议。
系统环境:
操作系统为Linux
数据库Oracle 19c
采用ASMLIB创建共享磁盘查看相关信息如下(仅以DATA01为例):
执行multipath -ll命令返回如下
DATA01 (3600507680c80833288000000000000ca) dm-3 IBM ,2145
size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=30 status=active
|- 7:0:0:0 sdb 8:16 active ready running
|- 16:0:0:0 sdae 65:224 active ready running
|- 7:0:1:0 sdi 8:128 active ready running
|- 16:0:2:0 sdas 66:192 active ready running
|- 7:0:2:0 sdp 8:240 active ready running
|- 16:0:3:0 sdar 66:176 active ready running
|- 7:0:3:0 sdw 65:96 active ready running
`- 16:0:1:0 sdad 65:208 active ready running
OCR03 (3600507680c80833288000000000000e3) dm-9 IBM ,2145
size=5.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=30 status=active
|- 7:0:0:6 sdh 8:112 active ready running
|- 16:0:0:6 sdak 66:64 active ready running
|- 7:0:1:6 sdo 8:224 active ready running
|- 16:0:2:6 sday 67:32 active ready running
|- 7:0:2:6 sdv 65:80 active ready running
|- 16:0:3:6 sdbe 67:128 active ready running
|- 7:0:3:6 sdac 65:192 active ready running
`- 16:0:1:6 sdaq 66:160 active ready running
OCR02 (3600507680c80833288000000000000e2) dm-8 IBM ,2145
size=5.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=30 status=active
|- 7:0:0:5 sdg 8:96 active ready running
|- 16:0:0:5 sdaj 66:48 active ready running
|- 7:0:1:5 sdn 8:208 active ready running
|- 16:0:2:5 sdax 67:16 active ready running
|- 7:0:2:5 sdu 65:64 active ready running
|- 16:0:3:5 sdbd 67:112 active ready running
|- 7:0:3:5 sdab 65:176 active ready running
`- 16:0:1:5 sdap 66:144 active ready running
OCR01 (3600507680c80833288000000000000e1) dm-7 IBM ,2145
size=5.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=30 status=active
|- 7:0:0:4 sdf 8:80 active ready running
|- 16:0:0:4 sdai 66:32 active ready running
|- 7:0:1:4 sdm 8:192 active ready running
|- 16:0:2:4 sdaw 67:0 active ready running
|- 7:0:2:4 sdt 65:48 active ready running
|- 16:0:3:4 sdbc 67:96 active ready running
|- 7:0:3:4 sdaa 65:160 active ready running
`- 16:0:1:4 sdao 66:128 active ready running
DATA04 (3600507680c80833288000000000000cd) dm-6 IBM ,2145
size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=30 status=active
|- 7:0:0:3 sde 8:64 active ready running
|- 16:0:0:3 sdah 66:16 active ready running
|- 7:0:1:3 sdl 8:176 active ready running
|- 16:0:2:3 sdav 66:240 active ready running
|- 7:0:2:3 sds 65:32 active ready running
|- 16:0:3:3 sdbb 67:80 active ready running
|- 7:0:3:3 sdz 65:144 active ready running
`- 16:0:1:3 sdan 66:112 active ready running
DATA03 (3600507680c80833288000000000000cc) dm-5 IBM ,2145
size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=30 status=active
|- 7:0:0:2 sdd 8:48 active ready running
|- 16:0:0:2 sdag 66:0 active ready running
|- 7:0:1:2 sdk 8:160 active ready running
|- 16:0:2:2 sdau 66:224 active ready running
|- 7:0:2:2 sdr 65:16 active ready running
|- 16:0:3:2 sdba 67:64 active ready running
|- 7:0:3:2 sdy 65:128 active ready running
`- 16:0:1:2 sdam 66:96 active ready running
DATA02 (3600507680c80833288000000000000cb) dm-4 IBM ,2145
size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=30 status=active
|- 7:0:0:1 sdc 8:32 active ready running
|- 16:0:0:1 sdaf 65:240 active ready running
|- 7:0:1:1 sdj 8:144 active ready running
|- 16:0:2:1 sdat 66:208 active ready running
|- 7:0:2:1 sdq 65:0 active ready running
|- 16:0:3:1 sdaz 67:48 active ready running
|- 7:0:3:1 sdx 65:112 active ready running
`- 16:0:1:1 sdal 66:80 active ready running
查看multipath.conf配置文件
defaults {
polling_interval 10
path_selector "round-robin 0"
path_grouping_policy multibus
uid_attribute ID_SERIAL
prio alua
path_checker readsector0
rr_min_io 100
max_fds 8192
rr_weight priorities
failback immediate
no_path_retry fail
user_friendly_names yes
}
multipaths {
multipath {
wwid 3600507680c80833288000000000000ca
alias DATA01
path_grouping_policy multibus
path_selector "round-robin 0"
failback manual
rr_weight priorities
no_path_retry 5
}
multipath {
wwid 3600507680c80833288000000000000cb
alias DATA02
path_grouping_policy multibus
path_selector "round-robin 0"
failback manual
rr_weight priorities
no_path_retry 5
}
multipath {
wwid 3600507680c80833288000000000000cc
alias DATA03
path_grouping_policy multibus
path_selector "round-robin 0"
failback manual
rr_weight priorities
no_path_retry 5
}
multipath {
wwid 3600507680c80833288000000000000cd
alias DATA04
path_grouping_policy multibus
path_selector "round-robin 0"
failback manual
rr_weight priorities
no_path_retry 5
}
multipath {
wwid 3600507680c80833288000000000000e1
alias OCR01
path_grouping_policy multibus
path_selector "round-robin 0"
failback manual
rr_weight priorities
no_path_retry 5
}
multipath {
wwid 3600507680c80833288000000000000e2
alias OCR02
path_grouping_policy multibus
path_selector "round-robin 0"
failback manual
rr_weight priorities
no_path_retry 5
}
multipath {
wwid 3600507680c80833288000000000000e3
alias OCR03
path_grouping_policy multibus
path_selector "round-robin 0"
failback manual
rr_weight priorities
no_path_retry 5
}
}
blacklist {
wwid 36f4ee08029ce6e002a007008d5711e3b
}
查看ASM信息
col path for a30
col name for a10
set line 300
SQL> select mode_status,name,state,path from v$asm_disk;
MODE_STATUS NAME STATE PATH
--------------------- ---------- ------------------------ ---------------------------
ONLINE OCR_0000 NORMAL /dev/oracleasm/disks/OCR01
ONLINE OCR_0001 NORMAL /dev/oracleasm/disks/OCR02
ONLINE OCR_0002 NORMAL /dev/oracleasm/disks/OCR03
ONLINE DATA_0003 NORMAL /dev/oracleasm/disks/DATA04
ONLINE DATA_0002 NORMAL /dev/oracleasm/disks/DATA03
ONLINE DATA_0001 NORMAL /dev/oracleasm/disks/DATA02
ONLINE DATA_0000 NORMAL /dev/oracleasm/disks/DATA01
7 rows selected.
SQL> show parameter asm_disk
NAME TYPE VALUE
------------------------------------ --------------------------------- ------------
asm_diskgroups string DATA
asm_diskstring string /dev/oracleasm/disks/*
可以看到,8个盘的标签都是DATA01,其中sdb、sdi、sdp、sdw、sdae、sdar、sdas、sdad为单路径,8个路径绑定成了DATA01,当前正使⽤着[8, 16]这个设备,这个设备为sdb。
查看设备的主设备号和次设备号,看这个设备是对应到/dev/目录下的那个设备时,发现/dev/oracleasm/disks下面的盘不是对应到/dev/dm-xx盘,由此基本确定了asmlib没有使用多路径的盘。
ASMLIB包的基本原理是对盘起一个名字,如“DATA01”然后把这个名字存入磁盘的内容的头部。下次机器自动启动时,会自动运行/etc/rc.d/init.d/oracleasm start,这时会自动扫描硬盘,扫描过程中,是会读前面我们写入名称,由于使用了多路径,那么在/dev/下会有几个设备名对应着同一个硬盘,其中/dev/sdxxx的是各个路径盘,/dev/dm-xx是把这些路径合并了一个盘,正常情况下oracle会要求ASMLIB使用/dev/dm-xx盘,但ASMLIB的扫描规则是使用最先扫描到的盘,后面再扫描到的设备,只要上面的名称与前面相同,就使用前面的设备名,不管再次扫描到的了。由此极有可能导致链路宕掉随之而来ASM卷组盘也掉了,进而引发数据文件损坏或者宕机。其实oracle的官方网站也说明了此问题。
Metalink Note<How To Setup ASM & ASMLIB On Native Linux Multipath Mapper disks?
[ID 602952.1]
ASMLIB Installation & Configuration On MultiPath Mapper Devices (Step by Step Demo) On RAC Or Standalone Configurations. (文档 ID 1594584.1)
建议修改/etc/sysconfig/oracleasm(oracleasm-_dev_oracleasm)配置文中ORACLEASM_SCANORDER及ORACLEASM_SCANEXCLUDE,以便ASMLIB能找到正确的设备文件
Configure ASMLIB to use multipath (from each node on RAC environments):
By any path the ASMLIB can found the disks, but, the best path is using the multipath :
Modify in /etc/sysconfig/oracleasm :
ORACLEASM_SCANORDER=”dm”
ORACLEASM_SCANEXCLUDE=”sd”
note: The Oracle ASMLib configuration file is located at /etc/sysconfig/oracleasm. It is a link to file /etc/sysconfig/oracleasm-_dev_oracleasm.
Restart ASMLIB (from each node on RAC environments):
/etc/init.d/oracleasm stop
/etc/init.d/oracleasm start
参考链接:
http://www.help2ora.com/index.php/2011/08/16/how-to-setup-asm-asmlib-on-native-linux-multipath-mapper-disks/
https://www.oracle.com/linux/technologies/multipath-disks.html
如有可能尽量采用udev绑定,因为Oracle官方也建议RedHat/OLE 5以上建议采用udev。因为采用ASMLIB包的形式,只要Linux Kernel更新,都需要替换新的ASMLIB包,这就意味着ASMLIB需要花费时间去维护,同时可能引入未知的Bug。
也欢迎关注我的公众号【徐sir的IT之路】,一起学习!
————————————————————————————
公众号:徐sir的IT之路
CSDN :https://blog.csdn.net/xxddxhyz?type=blog
墨天轮:https://www.modb.pro/u/3605
PGFANS:https://www.pgfans.cn/user/home?userId=5568
————————————————————————————