24Hibench

news2024/12/23 20:46:16

1. Hibench

官网

​ HiBench is a big data benchmark suite that helps evaluate different big data frameworks in terms of speed, throughput and system resource utilizations. It contains a set of Hadoop, Spark and streaming workloads, including Sort, WordCount, TeraSort, Repartition, Sleep, SQL, PageRank, Nutch indexing, Bayes, Kmeans, NWeight and enhanced DFSIO, etc. It also contains several streaming workloads for Spark Streaming, Flink, Storm and Gearpump.

1.1 workloads

There are totally 29 workloads in HiBench. The workloads are divided into 6 categories which are micro, ml(machine learning), sql, graph, websearch and streaming.

1.2 install maven

首先需要安装maven,并配好环境安装教程

mkdir repo
cd conf
vim settings.xml

修改仓库地址

<localRepository>/opt/module/maven/apache-maven-3.8.6/repo</localRepository>

阿里云镜像文件中已经有了,注释掉其他mirror

  <mirror>
      <id>alimaven</id>
      <name>aliyun maven</name>
     <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
      <mirrorOf>central</mirrorOf>
</mirror>

1.3 bulid

下载zip文件,上传解压后的文件 不要使用7.11 会出现版本问题

参照文档中的,构建HiBench项目,我使用的是全部安装:

#ALL
mvn -Dspark=3.1 -Dscala=2.12 clean package
#SPARK
mvn -Psparkbench -Dspark=3.1 -Dscala=2.12 clean package

image-20221113191059611

如果出现以下错误:

image-20221113171241273

原因是maven没有安装好,没有设置好镜像以及安装仓库,详情见安装教程

build成功

image-20221114134337141

第二次

image-20221214105422318

1.4 configure

hadoop.conf

image-20221122101805886

spark.conf

image-20221122101910798

Input data size

image-20230408154837846

if you chose a real large data size ,you may find the errors:

image-20230608165007024

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

you need to modify the mapred-site.xml, and add the context:

<property>
  <name>mapred.task.timeout</name>
  <value>800000</value>
  <final>true</final>
</property>
cluster mode
vim /opt/module/hibench/HiBench-master/HiBench-master/bin/functions/workload_functions.sh

# 修改run_spark_job 方法

image-20230702113310572

1.5 hadoop example

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/prepare/prepare.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/hadoop/run.sh

官网地址

运行成功

image-20221122121433468

image-20221122121637126

更详细的介绍

/opt/module/Hibench/HiBench-master/report/wordcount/hadoop

1.6 spark example

准备输入数据

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/prepare/prepare.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/terasort/prepare/prepare.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/sort/prepare/prepare.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/kmeans/prepare/prepare.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/bayes/prepare/prepare.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/lr/prepare/prepare.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/websearch/pagerank/prepare/prepare.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/graph/nweight/prepare/prepare.sh

控制台输出

image-20221122102053983

yarn

image-20221122102505411

进入HDFS查看准备的输入数据

image-20221122102227590

准备命令

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/prepare/prepare.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/terasort/prepare/prepare.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/sort/prepare/prepare.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/kmeans/prepare/prepare.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/bayes/prepare/prepare.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/lr/prepare/prepare.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/websearch/pagerank/prepare/prepare.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/graph/nweight/prepare/prepare.sh

注意jar文件夹中不要包含其他备用的jar包

运行命令

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/spark/run.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/terasort/spark/run.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/sort/spark/run.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/kmeans/spark/run.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/bayes/spark/run.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/lr/spark/run.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/websearch/pagerank/spark/run.sh

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/graph/nweight/spark/run.sh

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

成功!

image-20221214112321741

结果

image-20230610145858128

trouble

网络配置问题

所有任务在yarn上都用的是内网IP

image-20230608180507018

image-20230609113252954

Permission denied

# 进入bin目录
chmod -R +x ./bin/

multi-job

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

Parsing conf: /opt/module/hibench/HiBench-master/HiBench-master/conf/hibench.conf
Parsing conf: /opt/module/hibench/HiBench-master/HiBench-master/conf/spark.conf
Parsing conf: /opt/module/hibench/HiBench-master/HiBench-master/conf/workloads/websearch/pagerank.conf
probe sleep jar: /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.1.3-tests.jar
ERROR, execute cmd: '( /opt/module/hadoop-3.1.3/bin/yarn node -list 2> /dev/null | grep RUNNING )' timedout.
  STDOUT:

  STDERR:

  Please check!
Traceback (most recent call last):
  File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 685, in <module>
    load_config(conf_root, workload_configFile, workload_folder, patching_config)
  File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 217, in load_config
    generate_optional_value()
  File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 613, in generate_optional_value
    probe_masters_slaves_hostnames()
  File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 549, in probe_masters_slaves_hostnames
    probe_masters_slaves_by_Yarn()
  File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 500, in probe_masters_slaves_by_Yarn
    assert 0, "Get workers from yarn-site.xml page failed, reason:%s\nplease set `hibench.masters.hostnames` and `hibench.slaves.hostnames` manually" % e
AssertionError: Get workers from yarn-site.xml page failed, reason:( /opt/module/hadoop-3.1.3/bin/yarn node -list 2> /dev/null | grep RUNNING ) executed timedout for 5 seconds
please set `hibench.masters.hostnames` and `hibench.slaves.hostnames` manually
start ScalaSparkPagerank bench
/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/workload_functions.sh: line 38: .: filename argument required
.: usage: . filename [arguments]
/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/websearch/pagerank/spark/run.sh: line 26: OUTPUT_HDFS: unbound variable

原因可能是系统负载过高导致的响应迟钝

image-20230710195619538

2.records

parallelism = 18

有input的stage的任务数是由数据数据的大小决定的,spark.default.parallelism决定的是shuffle后的stage的任务数

LR

内存不够

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692258304054&to=1692264023063&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817154625-0003&var-groupbyInterval=1s

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

image-20230710222144197

http://192.168.10.102:18080/history/app-20230817154625-0003/stages/

image-20230817182124188

image-20230823160549361

Bayes

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692265076383&to=1692265140060&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817173746-0004&var-groupbyInterval=1s

image-20230817182326359

http://192.168.10.102:18080/history/app-20230817173746-0004/jobs/

image-20230817182448207

image-20230817182410877

image-20230823160616907

NWeightGraphX

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

image-20230710222031686

ScalaPageRank

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692277885573&to=1692278062264&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817211121-0012&var-groupbyInterval=1s

image-20230817212034137

http://192.168.10.102:18080/history/app-20230817211121-0012/stages/

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

distinct

image-20230817211940302

flatMap

image-20230817211756998

DenseKMeans

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692273890319&to=1692274051643&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817200442-0009&var-groupbyInterval=1s

image-20230817203320695

http://192.168.10.102:18080/history/app-20230817200442-0009/stages/

image-20230817203354123

image-20230823160320490

map

image-20230817210720121

collect

image-20230817210901877

image-20230817210942909

sort

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692276453862&to=1692276644236&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817204743-0011&var-groupbyInterval=1s

image-20230814105408629

http://192.168.10.102:18080/history/app-20230817204743-0011/stages/

image-20230817205354207

map

image-20230817205420719

reduce

image-20230817205507056

TeraSort

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692275803257&to=1692276079162&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817203719-0010&var-groupbyInterval=1s

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

http://192.168.10.102:18080/history/app-20230817203719-0010/stages/

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

map

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

reduce

image-20230817205732605

WordCount

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1691983226006&to=1691983545077&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230814112024-0002&var-groupbyInterval=1s

image-20230814112936597

parallelism=20

image-20230817210134565

http://192.168.10.102:18080/history/app-20230814112024-0002/jobs/

map

image-20230814113241553

reduce

image-20230814113216632

hdfs维护

hadoop fs -rm -r -skipTrash /hibench_test/HiBench/

hadoop dfsadmin -safemode leave

var-ApplicationId=app-20230814112024-0002&var-groupbyInterval=1s

[外链图片转存中…(img-eGnFeoml-1696143711685)]

parallelism=20

[外链图片转存中…(img-tqMq87MT-1696143711685)]

http://192.168.10.102:18080/history/app-20230814112024-0002/jobs/

map

[外链图片转存中…(img-ckoZlP4I-1696143711686)]

reduce

[外链图片转存中…(img-yA8CbUjp-1696143711686)]

hdfs维护

hadoop fs -rm -r -skipTrash /hibench_test/HiBench/

hadoop dfsadmin -safemode leave

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1054385.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

国庆放假作业2

1、select实现服务器并发 #include <myhead.h>#define PORT 7373 #define IP "192.168.1.9"int main(int argc,const char *argv[]) {//创建报式套接字int sfdsocket(AF_INET,SOCK_STREAM,0);if(sfd<0){ERR_MSG("socket error");return -1;}prin…

MATLAB与Python:优势与挑战

本文旨在探讨MATLAB与Python在特定领域内的使用情况&#xff0c;并分析两者之间的优势和挑战。 MATLAB和Python都是流行的编程语言&#xff0c;广泛应用于科学计算、数据分析和机器学习等领域。在某些领域&#xff0c;如航空航天工程、自动化和电子工程嵌入式系统开发等&#…

【分布式事务】

文章目录 解决分布式事务的思路seata四种模式1. XA模式2. AT模式AT模式与XA模式的区别是什么&#xff1f;脏写问题 3. TCC模式事务悬挂和空回滚 4. SAGA模式 四种模式对比口述AT模式与TCC模式高可用 什么是分布式事务&#xff1f; 分布式事务&#xff0c;就是指不是在单个服务或…

Arduino ESP32/ESP8266 +ST7735 1.8“tft中秋小时钟

Arduino ESP32 ST7735 1.8"tft中秋小时钟 &#x1f33c;原作者B站视频&#xff1a; ESP32中秋小时钟&#xff0c;表盘自动切换&#xff0c;代码开源&#xff0c;原图可下载&#xff08;案例应用&#xff09; &#x1f39e;tft ST7735 128160 1.8" 显示效果:(由于原作…

【vue3】wacth监听,监听ref定义的数据,监听reactive定义的数据,详解踩坑点

假期第二篇&#xff0c;对于基础的知识点&#xff0c;我感觉自己还是很薄弱的。 趁着假期&#xff0c;再去复习一遍 之前已经记录了一篇【vue3基础知识点-computed和watch】 今天在学习的过程中发现&#xff0c;之前记录的这一篇果然是很基础的&#xff0c;很多东西都讲的不够…

gcc中-I(大写的i)参数的作用

《gcc -I -L -l区别》是我参考的一篇博客。 gcc中-I参数可以帮助找到头文件的目录&#xff0c;比如在当前目录下有一个名为includeTestCom.c的c文件和名为includeCom的目录。 includeTestCom.c里边的内容如下&#xff1a; #include "good.h" int main(){printf(&q…

自己动手写编译器:实现命令行模块

在前面一系列章节中&#xff0c;我们完成了词法解析的各种算法。包括解析正则表达式字符串&#xff0c;构建 NFA 状态就&#xff0c;从 NFA 转换为 DFA 状态机&#xff0c;最后实现状态机最小化&#xff0c;接下来我们注重词法解析模块的工程化实现&#xff0c;也就是我们将所有…

CCF-CSP真题《202309-1 坐标变换(其一)》思路+python,c++,java满分题解

想查看其他题的真题及题解的同学可以前往查看&#xff1a;CCF-CSP真题附题解大全 试题编号&#xff1a;202309-1试题名称&#xff1a;坐标变换&#xff08;其一&#xff09;时间限制&#xff1a;1.0s内存限制&#xff1a;512.0MB问题描述&#xff1a; 问题描述 对于平面直角坐标…

11链表-迭代与递归

目录 LeetCode之路——206. 反转链表 分析&#xff1a; 解法一&#xff1a;迭代 解法二&#xff1a;递归 LeetCode之路——206. 反转链表 给你单链表的头节点 head &#xff0c;请你反转链表&#xff0c;并返回反转后的链表。 示例 1&#xff1a; 输入&#xff1a;head […

git你学“废”了吗?——git撤销操作指令详解

git你学“废”了吗&#xff1f;——git撤销操作指令详解&#x1f60e; 前言&#x1f64c;撤销的本质撤销修改情况一&#xff1a;撤销工作区的修改方式一&#xff1a;方式二&#xff1a;演示截图&#xff1a; 撤销修改情况二&#xff1a;撤销暂存区和工作区的修改操作截图&#…

【Java 进阶篇】JDBC DriverManager 详解

JDBC&#xff08;Java Database Connectivity&#xff09;是 Java 标准库中用于与数据库进行交互的 API。它允许 Java 应用程序连接到各种不同的数据库管理系统&#xff08;DBMS&#xff09;&#xff0c;执行 SQL 查询和更新操作&#xff0c;以及处理数据库事务。在 JDBC 中&am…

链表经典面试题(一)

面试题 1.反转链表的题目2.反转链表的图文分析3.反转链表的代码实现 1.反转链表的题目 2.反转链表的图文分析 我们在实现反转链表的时候,是将后面的元素变前面&#xff0c;前面的元素变后面&#xff0c;那么我们是否可以理解为&#xff0c;用头插法的思想来完成反转链表呢&…

力扣:116. 填充每个节点的下一个右侧节点指针(Python3)

题目&#xff1a; 给定一个 完美二叉树 &#xff0c;其所有叶子节点都在同一层&#xff0c;每个父节点都有两个子节点。二叉树定义如下&#xff1a; struct Node {int val;Node *left;Node *right;Node *next; } 填充它的每个 next 指针&#xff0c;让这个指针指向其下一个右侧…

计组--总线

一、概念 总线是一组能为多个部件分时共享的公共信息传送线路。 共享是指总线上可以挂接多个部件&#xff0c;各个部件之间互相交换的信息都可以通过这组线路分时共享。 分时是指同一时刻只允许有一个部件向总线发送信息&#xff0c;如果系统中有多个部件&#xff0c;则它们…

qt常用控件1

QLabel QLabel用于显示文本或图像。不提供用户交互功能。标签的视觉外观可以通过多种方式进行配置&#xff0c;并且可用于为另一个小组件指定焦点助记键。 常用API介绍&#xff1a; 获取对应的文本信息&#xff1a; 设置对其方式&#xff1a; 设置能否进行换行 获取及设置标…

mysql面试题9:MySQL中的SQL常见的查询语句有哪些?有哪些对SQL语句优化的方法?

该文章专注于面试,面试只要回答关键点即可,不需要对框架有非常深入的回答,如果你想应付面试,是足够了,抓住关键点 面试官:MySQL中的SQL常见的查询语句有哪些? 常见的SQL查询语句包括: SELECT:用于从一个或多个表中获取数据。 FROM:指定要查询的表名或视图名。 WHER…

ssh爆破分析

1. 2.日志分析 1.系统账号信息 2.确认攻击情况 3.管理员登录情况 4.处理措施

网络基础入门(认识网络 网络传输 概念举例详解)

本篇文章主要是对网络初学的概念进行解释&#xff0c;可以让你对网络有一个大概整体的认知。 文章目录 一、简单认识网络 1、1 什么是网络 1、2 网络分类 二、网络模型 2、1OSI七层模型 2、1、1 简单认识协议 2、1、2 OSI七层模型解释 2、2 TCP/IP五层(或四层)模型 三、网络传…

【生物信息学】计算图网络中节点的中心性指标:聚集系数、介数中心性、度中心性

目录 一、实验介绍 二、实验环境 1. 配置虚拟环境 2. 库版本介绍 3. IDE 三、实验内容 0. 导入必要的工具 1. 生成邻接矩阵simulate_G: 2. 计算节点的聚集系数 CC(G): 3.计算节点的介数中心性 BC(G) 4. 计算节点的度中心性 DC(G) 5. 综合centrality(G) 6. 代…

《 新手》web前端(axios)后端(java-springboot)对接简解

文章目录 <font color red>1.何为前后端对接?2.对接中关于http的关键点2.1. 请求方法2.2. 请求参数设置简解&#xff1a; 3.对接中的跨域(CROS)问题**为什么后端处理跨域尽量在业务之前进行&#xff1f;**3.总结 1.何为前后端对接? “前后端对接” 是指前端和后端两个…