五分钟搭建本地大数据集群

news2025/7/16 16:51:09

引言

刚接触大数据以及部分接触大数据多年的伙伴可能从来没有自己搭建过一套属于自己的大数据集群，今天就花点时间聊聊怎么快速搭建一套属于自己、且可用于操作、调试的大数据集群

正文

本次搭建的组件都有以下服务以及对应的版本

hadoop（3.2.4）
zookeeper（3.9.1）
kafka（2.13-3.6.1）

组件下载地址

上述组件都是apache旗下的，通过此地址找到对应的版本下载使用即可 https://archive.apache.org/dist/hadoop/common/，但如果下载速度慢的话可以考虑通过这个地址进行加速下载 https://mirrors.tuna.tsinghua.edu.cn/apache/，后面这个地址仅用于学习，请勿用于商用

hadoop

hadoop是大数据最基本的底座，在将安装包解压后修改下 ./etc/hadoop 目录下重要的四个配置内容

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
    <!-- 指定 namenode 的通信地址 -->
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <!-- 指定hadoop运行时产生文件的存储路径 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/Users/lin/dev/bigdata/hadoop-3.2.4/temp</value>
    </property>
</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
        <property>
                <name>dfs.permissions.enabled</name>
                <value>false</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>        
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>/Users/lin/dev/bigdata/hadoop-3.2.4/data/namenode</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>/Users/lin/dev/bigdata/hadoop-3.2.4/data/datanode</value>
        </property>
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>localhost:9001</value>
        </property>
        <property>
                <name>dfs.webhdfs.enabled</name>
                <value>true</value>
        </property>
        <property>
                <name>dfs.http.address</name>
                <value>0.0.0.0:50070</value>
        </property>
</configuration>

yarn-site.xml

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
                <final>true</final>
                <description>The runtime framework for executing MapReduce jobs</description>
        </property>
        <property>
                <name>yarn.app.mapreduce.am.env</name>
                <value>HADOOP_MAPRED_HOME=/Users/lin/dev/bigdata/hadoop-3.2.4</value>
        </property>
        <property>
                <name>mapreduce.map.env</name>
                <value>HADOOP_MAPRED_HOME=/Users/lin/dev/bigdata/hadoop-3.2.4</value>
        </property>
        <property>
                <name>mapreduce.reduce.env</name>
                <value>HADOOP_MAPRED_HOME=/Users/lin/dev/bigdata/hadoop-3.2.4</value>
        </property>
</configuration>

改完上面四个配置后，通过./sbin/start-all.sh指令启动集群，通过访问地址 http://localhost:50070 可看到hdfs服务已经正常启动
在这里插入图片描述

接下来简单验证下服务是否正常工作

展示文件目录
在这里插入图片描述

创建一个自定义目录data
在这里插入图片描述

上传一个本地文件到hadoop集群
在这里插入图片描述

通过上述演示已完整的部署本地的Hadoop服务

zookeeper

解压后通过指令bin/zkServer.sh start启动服务即可，通过指令查询可看到已经启动服务
在这里插入图片描述

接下来简单进行验证下，首先通过指令bin/zkCli.sh进入客户端
在这里插入图片描述

kafka

解压kafka安装包后，通过指令nohup bin/kafka-server-start.sh config/server.properties 2>&1 &进行服务的后台启动。通过linux指令可以看到kafka服务已经正常启动
在这里插入图片描述

接下来进行简单验证下

创建Topic

bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic TestKafkaTopic1

在这里插入图片描述

消费Topic

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic TestKafkaTopic1 --from-beginning

写Topic

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic  TestKafkaTopic1

在这里插入图片描述

查看消费情况

通过上述几步操作能看到我们的kafka服务也正常工作了

小结

以上就是搭建一个简单的本地调试环境的流程，最好是都能手动操作一次，对这几个基础服务都有一定的了解

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.coloradmin.cn/o/1452658.html

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈，一经查实，立即删除！