1.hadoop是一个分布式系统基础架构,主要解决海量数据额度存储与海量数据的分析计算问题
hdfs提供存储能力,yarn提供资源管理能力,MapReduce提供计算能力
2.安装
一:调整虚拟机内存,4G即可
二:下载安装包
网址:https://mirrors.aliyun.com/apache/hadoop/common/hadoop-3.4.0/hadoop-3.4.0.tar.gz
hadoop安装包
命令: wget https://mirrors.aliyun.com/apache/hadoop/common/hadoop-3.4.0/hadoop-3.4.0.tar.gz
三:解压: 命令: tar -zxvf hadoop-3.4.0.tar.gz -C ./
很遗憾,没有空间了,接下来解决这问题
查看磁盘文件,发现已经拉满了,接下来需要到VMware调整
删除快照后修改磁盘大小,我改了50G
再次解压,还是错误,df -h发现没变,应该是磁盘分区问题
(1)查看挂载点/的文件系统,在/dev/mapper/centos-root下面
(2)用mount命令查看挂载点的文件系统的文件类型也就是/dev/mapper/centos-root的文件类型
(3)此时发现分区是xfs类型
(4)命令:fdisk -l
(5)操作:命令 fdisk /dev/sda
按操作依次进行
(6)再次fdisk -l 发现有新分区,接下来格式化和挂载新分区,否则不能用
(7)先重启虚拟机操作系统,reboot
(8)先试用lvs命令,再创建物理卷 命令: pvcreate /dev/sda3
(9)物理卷添加到卷组中 命令: vgextend centos /dev/sda3 (centos为组名)
(10)查看可扩展的空间大小 命令:vgdisplay
找到这个free pe,这个是可扩充的大小(我弄完写的文章,所以这里是4mb)
(11)扩充磁盘空间: 命令: lvextend -L+16G /dev/mapper/centos-root /dev/sda3
(12).扩充生效 命令: xfs_growfs /dev/mapper/centos-root (后面这个是文件系统)
(13)再次df -h,发现容量扩充成功!
这时候解压没有问题
四:构建软连接
命令: ln -s /export/server/hadoop-3.4.0 /export/server/hadoop
五:修改配置文件hadoop-env.sh 命令: vi /export/server/hadoop-3.4.0/etc/hadoop/hadoop-env.sh
# 在文件开头加入:
#配置Java安装路径
export JAVA_HOHE=/export/server/jdk
#配置Hadoop安装路径
export HADOOP_HOME=/export/server/hadoop
# Hadoop hdfs配置文件路径
export HADOOP_CONF_OIR=$HADOOP_HOME/etc/hadoop
# Hadoop YARN配置文件路径
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
# Hadoop YARN 日志文件夹
export YARN_LOG_DIR=$HADOOP_HOME/logs/yarn
# Hadoop hdfs 日志文件夹
export HADOOP_LOG_DIR=$HADOOP_HOME/logs/hdfs
# Hadoop的使用启动用户配置
export HDFS_NAHENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export YARN_PROXYSERVER_USER=root
六:修改core-site.xml文件 命令: vi core-site.xml
全部删除,加入下面的!!
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file.-->
<configuration>
<!-- 指定 使用哪种文件系统-->
<property>
<name>fs.defaultFS</name>
<!-- 使用hdfs分布式系统-->
<!-- hdfs系统地址 hdfs://hdfs集群主节点名称:9000(默认端口号)-->
<!--因为是伪分布式,所有节点在同一台机子上,故节点名称为主机名-->
<value>hdfs://wtk:9000</value>
</property>
<!-- 指定hadoop进程工作目录,hadoop运行时产生文件的存储路径-->
<property>
<name>hadoop.tmp.dir</name>
<!--数据放在hadoop 的安装目录下是/tmp下-->
<value>/export/server/hadoop-3.4.0/tmp/</value>
</property>
七:修改hdfs-site.xml文件 命令: vi hdfs-site.xml
清空加入
<?xmm version="1.e" encoding="UTF-8"2>
<?xml-stylesheet type="text/xsl" href='"configuration.xsl"2>
<!—-
Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
<?xmm version="1.e" encoding="UTF-8"2>
<?xml-stylesheet type="text/xsl" href='"configuration.xsl"2>
<!—-
Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
UnLess required by applicable law or agreed to in writing, software
<description>Path on the local fIlesysten where the NameNode stores the namespace and transactions logs
persistently.</description>
</property>
<property>
<name>dfs.namenode.hosts</name>
<valve>wtk,wtk1,wtk2</value>
<description>List Of permitted DataNodes</description>
</property>
<configuration>
<!-- 设置hdfs副本数量:-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
八:修改配置文件 mapred-site.xml
清空加入
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 通知框架mappreduce使用YARN -->
<!-- 使得mappreduce 在资源调度集群(yarn)上跑-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
九:修改yarn-site.xml 配置
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- 配置yarn 集群主节点,因为是伪分布式,所以是本机-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop01</value>
</property>
<!-- reducer取数据的方式是mapreduce_shuffle -->
<!-- nodemanager 从节点 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
明天写