引言
刚接触大数据以及部分接触大数据多年的伙伴可能从来没有自己搭建过一套属于自己的大数据集群,今天就花点时间聊聊怎么快速搭建一套属于自己、且可用于操作、调试的大数据集群
正文
本次搭建的组件都有以下服务以及对应的版本
- hadoop(3.2.4)
- zookeeper(3.9.1)
- kafka(2.13-3.6.1)
组件下载地址
上述组件都是apache旗下的,通过此地址找到对应的版本下载使用即可 https://archive.apache.org/dist/hadoop/common/,但如果下载速度慢的话可以考虑通过这个地址进行加速下载 https://mirrors.tuna.tsinghua.edu.cn/apache/,后面这个地址仅用于学习,请勿用于商用
hadoop
hadoop是大数据最基本的底座,在将安装包解压后修改下 ./etc/hadoop 目录下重要的四个配置内容
core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定 namenode 的通信地址 -->
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储路径 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/lin/dev/bigdata/hadoop-3.2.4/temp</value>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/Users/lin/dev/bigdata/hadoop-3.2.4/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/Users/lin/dev/bigdata/hadoop-3.2.4/data/datanode</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>localhost:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.http.address</name>
<value>0.0.0.0:50070</value>
</property>
</configuration>
yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
<description>The runtime framework for executing MapReduce jobs</description>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/Users/lin/dev/bigdata/hadoop-3.2.4</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/Users/lin/dev/bigdata/hadoop-3.2.4</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/Users/lin/dev/bigdata/hadoop-3.2.4</value>
</property>
</configuration>
改完上面四个配置后,通过./sbin/start-all.sh
指令启动集群,通过访问地址 http://localhost:50070 可看到hdfs服务已经正常启动
接下来简单验证下服务是否正常工作
展示文件目录
创建一个自定义目录data
上传一个本地文件到hadoop集群
通过上述演示已完整的部署本地的Hadoop服务
zookeeper
解压后通过指令bin/zkServer.sh start
启动服务即可,通过指令查询可看到已经启动服务
接下来简单进行验证下,首先通过指令bin/zkCli.sh
进入客户端
kafka
解压kafka安装包后,通过指令nohup bin/kafka-server-start.sh config/server.properties 2>&1 &
进行服务的后台启动。通过linux指令可以看到kafka服务已经正常启动
接下来进行简单验证下
- 创建Topic
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic TestKafkaTopic1
- 消费Topic
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic TestKafkaTopic1 --from-beginning
- 写Topic
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TestKafkaTopic1
- 查看消费情况
通过上述几步操作能看到我们的kafka服务也正常工作了
小结
以上就是搭建一个简单的本地调试环境的流程,最好是都能手动操作一次,对这几个基础服务都有一定的了解