6-zookeeper-hadoop-ha故障转移机制,原理简述:
HA概述(2.X版本架构)。
1)、HA(High available),即高可用(7*24小时不间断服务。)
1、zookeeper协调服务,通知
2、zkfc是一个zookeeper的一个客户端,用于帮助namenode和zookeeper进行联系,管理namenode的状态。
3、3步骤通过zookeeper通知领一个客户端。
HDFS-HA集群配置
规划
hadoop102 | hadoop103 | hadoop104 |
---|---|---|
NameNode | NameNode | |
JournalNode | JournalNode | JournalNode |
DataNode | DataNode | DataNode |
ZK | ZK | ZK |
zkfc | zkfc | zkfc |
ResourceManager | ||
NodeManager | NodeManager | NodeManager |
1.拷贝原有hadoop。
[root@hadoop102 module]# mkdir ha
[root@hadoop102 module]# cp -r hadoop-3.1.4/ ha/
[root@hadoop102 hadoop]# pwd
/opt/module/ha/hadoop-3.1.4/etc/hadoop
2、配置core-site.xml
core-site.xml
fs.defaultFS
hdfs://mycluster
hadoop.tmp.dir
/opt/module/ha/hadoop-3.1.4/data
3、配置hdfs.xml
core-site.xml
dfs.nameservices
mycluster
dfs.ha.namenode.mycluster
nn1,nn2
dfs.namenode.rpc-address.mycluster.nn1
hadoop102:9000
dfs.namenode.rpc-address.mycluster.nn2
hadoop102:9000
dfs.namenode.http-address.mycluster.nn1
hadoop102:50070
dfs.namenode.http-address.mycluster.nn2
hadoop102:50070
<!-- 指定Namenode元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop102:8485;hadoop103:8485;hadoop104:8485/mycluster</value>
</property>
dfs.ha.fencing.methods
sshfence
dfs.ha.fencing.ss.private-key-files
/root/.ssh/id_rsa
dfs.journalnode.edits.dir
/opt/module/ha/hadoop-3.1.4/data/jn
dfs.permissions.enable
false
dfs.client.failover.proxy.provider.mycluster
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.ha.automatic-failover.enabled
true
4、分发ha
[root@hadoop102 module]# xsync ha
5、启动qjm集群,数据存储的地方。
[root@hadoop102 hadoop-3.1.4]# sbin/hadoop-daemons.sh start journalnode
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement “hdfs --workers --daemon start” instead.
[root@hadoop102 hadoop-3.1.4]# hdfs --workers --daemon start
[root@hadoop102 zookeeper-3.4.10]# jpsall
=============== 192.168.1.102 ===============
14208 QuorumPeerMain
13941 JournalNode
14255 Jps
=============== 192.168.1.103 ===============
13249 QuorumPeerMain
13174 JournalNode
13287 Jps
=============== 192.168.1.104 ===============
13280 Jps
13251 QuorumPeerMain
13175 JournalNode
6、格式化
[root@hadoop102 hadoop-3.1.4]# bin/hdfs namenode -format
7、启动namenode
[root@hadoop102 hadoop-3.1.4]# hdfs -daemon start
出错,暂停[root@hadoop102 hadoop-3.1.4]# hdfs --workers --daemon stop(需要继续操作)
8、hadoop103拉取namenode
[root@hadoop103 hadoop-3.1.4]#bin/hdfs namenode -bootstrapStandby
9、hadoop103启动
[root@hadoop103 hadoop-3.1.4]#sbin/hadoop-daemon.sh start namenode
[root@hadoop103 hadoop-3.1.4]#sbin/hadoop-daemon.sh start datanode
10、手动standb切换,将nn1切换为active模式
[root@hadoop102 hadoop-3.1.4]#bin/hdfs haadmin transitionToActive nn1
自动故障转移测试
修改core-site.xml
fs.defaultFS hdfs://mycluster hadoop.tmp.dir /opt/module/ha/hadoop-3.1.4/data ha.zookeeper.quorm hadoop102:2181,hadoop103:2181,hadoop104:2181修改hdfs-site.xml
dfs.nameservices mycluster dfs.ha.namenode.mycluster nn1,nn2 dfs.namenode.rpc-address.mycluster.nn1 hadoop102:9000 dfs.namenode.rpc-address.mycluster.nn2 hadoop102:9000 dfs.namenode.http-address.mycluster.nn1 hadoop102:50070 dfs.namenode.http-address.mycluster.nn2 hadoop102:50070 <!-- 指定Namenode元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop102:8485;hadoop103:8485;hadoop104:8485/mycluster</value>
</property>
<!-- 配置隔离机制,即同一时刻只能有一台服务器对外相应,确保没有两个active,不会出现脑裂现象 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- 使用隔离机制时使用ssh无秘钥登录 -->
<property>
<name>dfs.ha.fencing.ss.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<!-- 声明journalnode服务器存储目录-->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/module/ha/hadoop-3.1.4/data/jn</value>
</property>
<!-- 关闭权限检查-->
<property>
<name>dfs.permissions.enable</name>
<value>false</value>
</property>
<!-- 访问代理类:client,mycluster,active配置失败自动切换实现方式-->
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- automatic failover自动故障转移-->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
分发etc
[root@hadoop102 hadoop-3.1.4]# xsync etc/
启动QJM集群
[root@hadoop102 hadoop-3.1.4]# hdfs --workers --daemon start
格式化,记得删除data和logs
???Re-format filesystem in Storage Directory root= /opt/module/hadoop-3.1.4/data/dfs/name; location= null ? (Y or N)
[root@hadoop102 hadoop-3.1.4]# bin/hdfs namenode -format
初始化HA在Zookeeper中的状态
[root@hadoop102 hadoop-3.1.4]# bin/hdfs zkfc -formatZK
同步
[root@hadoop103 hadoop-3.1.4]#bin/hdfs namenode -bootstrapStandby
启动HDFS服务
[root@hadoop102 hadoop-3.1.4]# sbin/start-dfs.sh
namenode启
[root@hadoop103 hadoop-3.1.4]#sbin/hadoop-daemon.sh start namenode
学习路径:https://space.bilibili.com/302417610/,如有侵权,请联系q进行删除:3623472230