Win10 安装配置 Hadoop 及 Spark

news2024/10/6 22:22:57

以下内容只针对 Win10 系统

1. 环境安装

(1) 安装Java并配置环境变量

https://www.oracle.com/java/technologies/downloads/#java8-windows

(2) 安装Scala

https://www.scala-lang.org/ 或 https://github.com/lampepfl/dotty/releases/tag/3.2.2
Scala
配置环境变量,在系统环境变量path中新增D:\app\Scala\scala3-3.2.2\bin
Scala

Scala

完成后打开cmd 输入scala测试一下
Scala CMD

(3) 安装Spark

前往链接 spark doanload page 安装

Download Spark

点击进入下一个页面,下载压缩包文件spark-3.4.0-bin-hadoop3.tgz
Spark
将文件解压到你想要的文件夹目录中,我的是这样子的
Spark
接着配置环境变量,在系统变量中新增以下配置。变量名一定要命名为SPARK_HOME,否则接下来运行程序的时候会因为找不到这个变量而出错
System
在环境变量path中新增以下配置

Spark

CMD中输入spark-shell看看成功了没
spark-shell

(4) 安装Hadoop

https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.4/hadoop-3.2.4.tar.gz
Hadoop
配置环境变量,在系统变量中新增以下配置。变量名一定要命名为HADOOP_HOME,变量值替换为你安装hadoop的目录
Hadoop
path中添加以下配置

Hadoop

接下来前往 cdarlint/winutils 这个地址,下载你Hadoop所对应版本的winutils,这里我的Hadoop版本是 3.2.1,则下载下图中的版本
winutils
bin里面的文件下载下来
winutils
然后将这些文件全部放到你Hadoop安装目录的bin文件夹中。CMD输入hadoop version看看成功了没
hatoop

(5) 安装python

如果系统中使用Anaconda来控制python的版本,则在系统环境变量中添加以下配置(假设Anaconda安装在D:\tools\Anaconda\中,则直接在后面加python.exe即可)
Python
然后在path中添加

PYSPARK_PYTHON

(6) 检查

如果以上环境全部安装完成并完成配置,则系统环境变量path中会有以下配置
path

2. 测试

以下为测试input.txt文件

word count from Wikipedia the free encyclopedia
the word count is the number of words in a document or passage of text Word counting may be needed when a text
is required to stay within certain numbers of words This may particularly be the case in academia legal
proceedings journalism and advertising Word count is commonly used by translators to determine the price for
the translation job Word counts may also be used to calculate measures of readability and to measure typing
and reading speeds usually in words per minute When converting character counts to words a measure of five or
six characters to a word is generally used Contents Details and variations of definition Software In fiction
In non fiction See also References Sources External links Details and variations of definition
This section does not cite any references or sources Please help improve this section by adding citations to
reliable sources Unsourced material may be challenged and removed
Variations in the operational definitions of how to count the words can occur namely what counts as a word and
which words don't count toward the total However especially since the advent of widespread word processing there
is a broad consensus on these operational definitions and hence the bottom line integer result
The consensus is to accept the text segmentation rules generally found in most word processing software including how
word boundaries are determined which depends on how word dividers are defined The first trait of that definition is that a space any of various whitespace
characters such as a regular word space an em space or a tab character is a word divider Usually a hyphen or a slash is too
Different word counting programs may give varying results depending on the text segmentation rule
details and on whether words outside the main text such as footnotes endnotes or hidden text) are counted But the behavior
of most major word processing applications is broadly similar However during the era when school assignments were done in
handwriting or with typewriters the rules for these definitions often differed from todays consensus
Most importantly many students were drilled on the rule that certain words don't count usually articles namely a an the but
sometimes also others such as conjunctions for example and or but and some prepositions usually to of Hyphenated permanent
compounds such as follow up noun or long term adjective were counted as one word To save the time and effort of counting
word by word often a rule of thumb for the average number of words per line was used such as 10 words per line These rules
have fallen by the wayside in the word processing era the word count feature of such software which follows the text
segmentation rules mentioned earlier is now the standard arbiter because it is largely consistent across documents and
applications and because it is fast effortless and costless already included with the application As for which sections of
a document count toward the total such as footnotes endnotes abstracts reference lists and bibliographies tables figure
captions hidden text the person in charge teacher client can define their choice and users students workers can simply
select or exclude the elements accordingly and watch the word count automatically update Software Modern web browsers
support word counting via extensions via a JavaScript bookmarklet or a script that is hosted in a website Most word
processors can also count words Unix like systems include a program wc specifically for word counting
As explained earlier different word counting programs may give varying results depending on the text segmentation rule
details The exact number of words often is not a strict requirement thus the variation is acceptable
In fiction Novelist Jane Smiley suggests that length is an important quality of the novel However novels can vary
tremendously in length Smiley lists novels as typically being between and words while National Novel Writing Month
requires its novels to be at least words There are no firm rules for example the boundary between a novella and a novel
is arbitrary and a literary work may be difficult to categorise But while the length of a novel is to a large extent up
to its writer lengths may also vary by subgenre many chapter books for children start at a length of about words and a
typical mystery novel might be in the to word range while a thriller could be over words
The Science Fiction and Fantasy Writers of America specifies word lengths for each category of its Nebula award categories
Classification	Word count Novel over words Novella to words Novelette to words Short story under words
In non fiction The acceptable length of an academic dissertation varies greatly dependent predominantly on the subject
Numerous American universities limit Ph.D. dissertations to at most words barring special permission for exceeding this limit

使用python运行以下程序看看,运行之前使用pip install pyspark安装pyspark

from pyspark import SparkConf  # Spark Configuration
from pyspark import SparkContext  # Spark Context

conf = SparkConf().setMaster("local[*]")
spark = SparkContext(conf=conf)

rdd_init = spark.textFile("input.txt")
rdd_init.collect()

rdd_flatmap = rdd_init.flatMap(lambda line: line.split(" ")) ## Return PipelinedRDD 
rdd_flatmap.collect() ## flatmap, 对元素内部继续进行 map,深层次的 map

kv = rdd_flatmap.map(lambda word: (word, 1))
wordCounts = kv.reduceByKey(lambda a, b: a + b)
wordCounts = wordCounts.map(lambda x: (x[1], x[0])).sortByKey((False))
print(wordCounts.collect())
wordCounts.coalesce(1).saveAsTextFile("./output/")

运行成功则可以在output文件夹中看到以下内容,其中part-00000则包含了所有单词统计信息

output

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/424512.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

谷歌研究科学家:ChatGPT秘密武器的演进与局限

来源|TalkRL OneFlow编译 翻译|徐佳渝、贾川 同样是基于GPT预训练模型,为什么ChatGPT的效果要远远超出GPT-3等前几代模型?答案已经揭晓,成就ChatGPT的秘密武器在于RLHF,也就是人类反馈的强化学习。 在预训…

SpringMVC的入门案例

三层架构和mvc 三层架构: 我们的开发架构一般都是基于两种形式,一种是C/S 架构,也就是客户端/服务器,另一种是 B/S 架构,也就是测览器服务器。在 avaEE开发中,几乎全都是基于 B/S 架构的开发。那么在 B/S…

Dream 主题之 Halo 2.0 适配,以及适配前后的一些异同

经过一段时间的适配,目前 Dream 已经发布了基于 Halo 2.x 的第一个预发版本。目前对主题所有功能都完成了适配,但是由于 Halo 2.x 与 1.x 的一些不同,以及 Thyeleaf 模板引擎与 FreeMaker 模板渲染引擎的一些不同的特性,适配前与适…

Python机器学习、深度学习技术提升气象、海洋、水文领域实践应用能力

Python是功能强大、免费、开源,实现面向对象的编程语言,能够在不同操作系统和平台使用,简洁的语法和解释性语言使其成为理想的脚本语言。除了标准库,还有丰富的第三方库,Python在数据处理、科学计算、数学建模、数据挖…

redis docker 集群搭建

redis docker 集群搭建 1. 安装镜像 docker pull redis:latest2. 创建conf模板和脚本 # 创建模板目录 mkdir /data/redis_conf# 创建实际映射目录 mkdir /data/redis_data# 在/data/redis_conf创建conf模板 touch redis_cluster.conf.template# 编写redis_cluster.conf.temp…

详解TCP协议与模拟实现TCP版本的字典翻译客户端与服务器

文章目录TCP协议前言1.TCP协议介绍2. TCP协议的特点3. TCP协议的机制3.1 确认应答机制3.2 超时重传机制3.3 连接管理机制3.3.1 三次握手3.3.2 四次挥手3.4 滑动窗口3.5 流量控制3.6 拥塞控制3.7 延时应答机制3.8 捎带应答机制3.9 面向字节流3.10 异常处理4.TCP socket的介绍5.实…

SureX 全新投资矩阵推出,引领理财新潮流!

前言—— 随着加密货币市场的崛起,越来越多的投资者开始关注这个领域。但是,对于新手投资者来说,加密货币市场充满了各种不确定性和风险。如何在这个市场上游刃有余?SureX 零操作理财产品来袭,为新手投资者提供了一种…

[SWPUCTF] 2021新生赛之(NSSCTF)刷题记录 ①

[SWPUCTF] 2021 新生赛(NSSCTF刷题记录wp)[SWPUCTF 2021 新生赛]gift_F12[第五空间 2021]签到题[SWPUCTF 2021 新生赛]jicao[SWPUCTF 2021 新生赛]easy_md5[SWPUCTF 2021 新生赛]caidao[SWPUCTF 2021 新生赛]include[SWPUCTF 2021 新生赛]easyrce[SWPUCT…

机器人提示词工程师 Robotics Prompt Engineer

还没毕业,在校学习的各项技能都已经没用了,也别急着焦虑和忧伤,工业时代到信息时代,信息时代到智能时代,换代对每个普通人都是非常具有挑战性的,也是新一轮洗牌的开始。 机器人提示词工程师的核心竞争力包括…

【音视频第14天】webRTC协议(1)

目录协议ICESTUNNATTURNSDPSDP结构Signaling and ConnectingSignaling: How peers find each other in WebRTCConnecting and NAT Traversal with STUN/TURNSignalingsdp协议WebRTC如何使用sdpWebRTC会话示例Connecting为什么WebRTC需要一个专用的子系统来连接?Networking rea…

配置 Zabbix Server 监控 Kafka 集群

目录 第一章.环境安装部署 第一章.环境安装部署 1.1安装卡夫卡集群跟zabbix 第二章操作步骤 2.1.记录集群 Zabbix 监控节点地址 2.2.在kafka三个节点安装安装 zabbix-agent2 2.3在 Web 页面中添加 agent 主机 第一章.环境安装部署 1.1安装卡夫卡集群跟zabbix systemctl…

2023年想学习编程语言,该选哪种?

2023年想学习编程语言,该选哪种?在计算机广泛运用于社会的各个行业领域乃至生活日常每个角落的今天,选择学习一门计算机语言真的很不错,它会让你的生活从此与众不同,拥有另一番光景的未来。 根据最新的编程语言排行榜…

聚焦运营商信创运维,美信时代监控易四大亮点值得一试!

2021年11月《“十四五”信息通信行业发展规划》提出,到2025年,我国将建立高速泛在、集成互联、智能绿色、安全可靠的新型数字基础设施体系。 此《规划》让我国运营商信创进一步加速,中国移动、中国电信、中国联通等都先后加入信创大军&#x…

尚硅谷大数据技术Scala教程-笔记05【模式匹配、异常、隐式转换、泛型、scala总结】

视频地址:尚硅谷大数据技术之Scala入门到精通教程(小白快速上手scala)_哔哩哔哩_bilibili 尚硅谷大数据技术Scala教程-笔记01【Scala课程简介、Scala入门、变量和数据类型、运算符、流程控制】尚硅谷大数据技术Scala教程-笔记02【函数式编程】…

性能测试简介

性能测试是通过模拟真实的用户,对软件或系统进行操作,查看其响应时间、响应速度、负载能力等。并分析在不同的业务需求下,系统的负载情况是否满足要求。 性能测试主要从两个方面进行:一方面是性能测试本身,包括压力测试…

【Vue-cli】前端工程化环境准备

一、知识点整理 1、Vue-cli 是Vue官方提供的一个脚手架,用于快速生成一个 Vue 的项目模板。 2、Vue-cli提供了如下功能: 1)统一的目录结构 2)本地调试 3)热部署 4)单元测试 5)集成打包上线 3、需安装依赖…

2023年最新网络安全渗透工程师面试题汇总!不看亏大了!

技术面试问题 CTF 说一个印象深刻的CTF的题目 Padding Oracle->CBC->密码学(RSA/AES/DSA/SM) CRC32 反序列化漏洞 sql二次注入 第一次进行数据库插入数据的时候,仅仅只是使用了 addslashes 或者是借助get_magic_quotes_gpc 对其中的特殊字符进行了转义&…

Java中的文件操作

Java中通过java.io.File类对一个文件(包含目录)进行抽象的描述。注意有File对象,并不代表真实存在该文件。 1.File概述 我们先看看File类中的常见属性、构造方法和方法 1.1属性 修饰符及类型属性说明static StringpathSeparator依赖系统的…

利用ffmpeg源码安装+vscode开发环境搭建详解

前言: 大家好,今天给大家分享一篇ffmpeg开发环境的搭建,我在很早之前也给搭建过ffmpeg源码的安装,但是并没有给大家去搭建开发环境,而且当时的版本也比较老,很多细节问题没有给大家展示如何解决&#xff01…

win11破解以开启多用户同时登陆

1、简述 背景就是有一台电脑,windows11的专业版,上面有一套软件,但是这台电脑还有人需要用。电脑配置还不错,所以就想在创建一个账户,让需要用那套软件的人远程登陆使用。 步骤还不少,有一丢丢啰嗦。 2、首…