搭建hadoop高可用集群(一)
- 配置hadoop
- hadoop-env.sh
- workers
- core-site.xml
- hdfs-site.xml
- mapred-site.xml
- yarn-site.xml
- /etc/profile
- 拷贝
- 集群首次启动
- 1、先启动zk集群(自动化脚本)
- 2、在hadoop151,hadoop152,hadoop153启动JournalNode
- 3、在hadoop151格式化
- 4、在hadoop151启动namenode服务
- 5、在hadoop152机器上同步namenode信息
- 6、在hadoop152上启动namenode服务
- 7、关闭所有dfs有关的服务
- 8、格式化zk
- 9、启动dfs
- 10、启动yarn
- 安装成功
配置hadoop
解压完后,单独配置这6个文件
hadoop-env.sh
第54行
export JAVA_HOME=/opt/soft/jdk180
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_JOURNALNODE_USER=root
export HDFS_ZKFC_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
workers
填入ip
hadoop151
hadoop152
hadoop153
hadoop154
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://gky</value>
<description>逻辑名称,必须与hdfs-site.xml中的dfs.nameservice值保持一致</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/soft/hadoop313/tmpdata</value>
<description>namenode上本地的hadoop临时文件夹</description>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>root</value>
<description>默认用户</description>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
<description></description>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
<description></description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
<description>读写文件的buffer大小为:128k</description>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop151:2181,hadoop152:2181,hadoop153:2181</value>//改成自己的ip
<description>zookeeper队列</description>
</property>
<property>
<name>ha.zookeeper.session-timeout.ms</name>
<value>10000</value>
<description>hadoop连接zookeeper的超时时长设置为10s</description>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>hadoop中每一个block文件的备份数量</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/soft/hadoop313/data/dfs/name</value>
<description>namenode上存储hdfs名字空间元数据的目录</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/soft/hadoop313/data/dfs/data</value>
<description>datanode上数据块的物理存储位置目录</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop151:9869</value>
<description></description>
</property>
<property>
<name>dfs.nameservices</name>
<value>gky</value>
<description>指定hdfs的nameservice,需要和core-site.xml中保持一致</description>
</property>
<property>
<name>dfs.ha.namenodes.gky</name>
<value>nn1,nn2</value>
<description>gky为集群的逻辑名称,映射两个namenode逻辑</description>
</property>
<property>
<name>dfs.namenode.rpc-address.gky.nn1</name>
<value>hadoop151:9000</value>
<description>namenode1的RPC通信地址</description>
</property>
<property>
<name>dfs.namenode.http-address.gky.nn1</name>
<value>hadoop151:9870</value>
<description>namenode1的http通信地址</description>
</property>
<property>
<name>dfs.namenode.rpc-address.gky.nn2</name>
<value>hadoop152:9000</value>
<description>namenode2的RPC通信地址</description>
</property>
<property>
<name>dfs.namenode.http-address.gky.nn2</name>
<value>hadoop152:9870</value>
<description>namenode2的http通信地址</description>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop151:8485;hadoop152:8485;hadoop153:8485/gky</value>
<description>指定NameNode的edits元数据的共享存储位置(JournalNode列表)</description>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/soft/hadoop313/data/journaldata</value>
<description>指定JournalNode在本地磁盘存放数据的位置</description>
</property>
<!-- 容错 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
<description>开启NameNode故障自动切换</description>
</property>
<property>
<name>dfs.client.failover.proxy.provider.gky</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
<description>失败后自动切换的实现方式</description>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
<description>防止脑裂的处理</description>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
<description>使用sshfence隔离机制,需要ssh免密登录</description>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
<description>关闭HDFS操作权限验证</description>
</property>
<property>
<name>dfs.image.transfer.bandwidthPerSec</name>
<value>1048576</value>
<description></description>
</property>
<property>
<name>dfs.block.scanner.volume.bytes.per.second</name>
<value>1048576</value>
<description></description>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>job执行框架:local,classic or yarn</description>
<final>true</final>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>/opt/soft/hadoop313/etc/hadoop:/opt/soft/hadoop313/share/hadoop/common/lib/*:/opt/soft/hadoop313/share/hadoop/common/*:/opt/soft/hadoop313/share/hadoop/hdfs/*:/opt/soft/hadoop313/share/hadoop/hdfs/lib/*:/opt/soft/hadoop313/share/hadoop/mapreduce/*:/opt/soft/hadoop313/share/hadoop/mapreduce/lib/*:/opt/soft/hadoop313/share/hadoop/yarn/*:/opt/soft/hadoop313/share/hadoop/yarn/lib/*</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop151:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop151:19888</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
<description>map阶段的task工作内存</description>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>2048</value>
<description>reduce阶段的task工作内存</description>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
<description>开启resourcemanager高可用</description>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrcabc</value>
<description>指定yarn的集群中的id</description>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
<description>指定resourcemanager的名字</description>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop153</value>
<description>设置rm1的名字</description>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop154</value>
<description>设置rm2的名字</description>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop153:8088</value>
<description></description>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop154:8088</value>
<description></description>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop151:2181,hadoop152:2181,hadoop153:2181</value>
<description>指定zk集群地址</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>运行mapreduce程序必须配置的附属服务</description>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/opt/soft/hadoop313/tmpdata/yarn/local</value>
<description>nodemanager本地存储目录</description>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/opt/soft/hadoop313/tmpdata/yarn/log</value>
<description>nodemanager本地日志目录</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
<description>resource进程的内存</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
<description>resource工作中所能使用机器的内核数</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>256</value>
<description></description>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<description>yarn的日志能不能合并</description>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>86400</value>
<description>yarn的合并日志保存的时间(多少秒)</description>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description></description>
</property>
<property>
<name>yarn.application.classpath</name>
<value>/opt/soft/hadoop313/etc/hadoop:/opt/soft/hadoop313/share/hadoop/common/lib/*:/opt/soft/hadoop313/share/hadoop/common/*:/opt/soft/hadoop313/share/hadoop/hdfs/*:/opt/soft/hadoop313/share/hadoop/hdfs/lib/*:/opt/soft/hadoop313/share/hadoop/mapreduce/*:/opt/soft/hadoop313/share/hadoop/mapreduce/lib/*:/opt/soft/hadoop313/share/hadoop/yarn/*:/opt/soft/hadoop313/share/hadoop/yarn/lib/*</value>
<description></description>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
<description></description>
</property>
</configuration>
/etc/profile
#HADOOP_HOME
export HADOOP_HOME=/opt/soft/hadoop313
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/lib
拷贝
将配置好的文件拷贝到另外三台机器中
scp -r ./hadoop313/ root@hadoop151:/opt/soft
scp -r ./hadoop313/ root@hadoop152:/opt/soft
scp -r ./hadoop313/ root@hadoop153:/opt/soft
scp -r ./hadoop313/ root@hadoop154:/opt/soft
scp -r /etc/profile root@hadoop151:/etc
scp -r /etc/profile root@hadoop152:/etc
scp -r /etc/profile root@hadoop153:/etc
scp -r /etc/profile root@hadoop154:/etc
集群首次启动
1、先启动zk集群(自动化脚本)
2、在hadoop151,hadoop152,hadoop153启动JournalNode
hdfs --daemon start journalnode
可以用脚本查看三台机器的启动状态
3、在hadoop151格式化
hdfs namenode -format
4、在hadoop151启动namenode服务
hdfs --daemon start namenode
5、在hadoop152机器上同步namenode信息
hdfs namenode -bootstrapStandby
6、在hadoop152上启动namenode服务
hdfs --daemon start namenode
没启动之前的jps
启动之后
查看namenode节点状态
hdfs haadmin -getServiceState nn2
7、关闭所有dfs有关的服务
stop-dfs.sh
8、格式化zk
hdfs zkfc -formatZK
格式化完可以进工作空间
zkCli.sh
9、启动dfs
start-dfs.sh
查看namenode节点状态
151挂掉后,152会变成active,如果151又上线,它不会变成active,会变成standby
10、启动yarn
start-yarn.sh
查看状态
查看resourcemanager节点状态
yarn rmadmin -getServiceState rm1
如图153是active
当输入 hadoop153:8088或hadoop154:8088时,页面地址都会转到hadoop153:8088
安装成功
上传一个文件,测试wordcount,运行成功,即安装成功
后面hadoop可直接用start-all.sh开启,stop-all.sh关闭;zookeeper可以用脚本一键开启关闭(要注意开启时,要先开启zookeeper)