字符串压缩(二)之LZ4

news2024/11/24 5:05:44

一、LZ4压缩与解压

  LZ4有两个压缩函数。默认压缩函数原型:

  int LZ4_compress_default(const char* src, char* dst, int srcSize, int dstCapacity);

  快速压缩函数原型:

  int LZ4_compress_fast (const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);

  快速压缩函数acceleration的参数范围:[1 ~ LZ4_ACCELERATION_MAX],其中LZ4_ACCELERATION_MAX为65537。什么意思呢,简单的说就是acceleration值越大,压缩速率越快,但是压缩比就越低,后面我会用实验数据来进行说明。

  另外,当acceleration = 1时,就是简化版的LZ4_compress_defaultLZ4_compress_default函数默认acceleration = 1。

  LZ4也有两个解缩函数。安全解缩函数原型:

  int LZ4_decompress_safe (const char* src, char* dst, int compressedSize, int dstCapacity);

  快速解缩函数原型:
  int LZ4_decompress_fast (const char* src, char* dst, int originalSize);

  快速解压函数不建议使用。因为LZ4_decompress_fast 缺少被压缩后的文本长度参数,被认为是不安全的,LZ4建议使用LZ4_decompress_safe。

  同样,我们先来看看LZ4的压缩与解压缩示例。

 1 #include <stdio.h>
 2 #include <string.h>
 3 #include <sys/time.h>
 4 #include <malloc.h>
 5 #include <lz4.h>
 6 #include <iostream>
 7 
 8 using namespace std;
 9 
10 int main()
11 {
12     char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot \
13     play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, \
14     run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.\
15     Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.\
16     Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy \
17     puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, \
18     George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having \
19     a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, \
20     George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.\
21     Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. \
22     Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.\
23     Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. \
24     Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?\
25     Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. \
26     You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy \
27     puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, \
28     it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, \
29     when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play \
30     in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are \
31     wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping \
32     up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: \
33     It's only mud.";
34 
35     size_t com_space_size;
36     size_t peppa_pig_text_size;
37 
38     char *com_ptr = NULL;
39 
40     // compress
41     peppa_pig_text_size = strlen(peppa_pig_buf);
42     com_space_size = LZ4_compressBound(peppa_pig_text_size);
43     
44     com_ptr = (char *)malloc(com_space_size);
45     if(NULL == com_ptr) {
46         cout << "compress malloc failed" << endl;
47         return -1;
48     }
49 
50     memset(com_ptr, 0, com_space_size);
51 
52     size_t com_size;
53     //com_size = LZ4_compress_default(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size);
54     com_size = LZ4_compress_fast(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size, 1);
55     cout << "peppa pig text size:" << peppa_pig_text_size << endl;
56     cout << "compress text size:" << com_size << endl;
57     cout << "compress ratio:" << (float)peppa_pig_text_size / (float)com_size << endl << endl;
58 
59 
60     // decompress
61     size_t decom_size;
62     char* decom_ptr = NULL;
63     
64     decom_ptr = (char *)malloc((size_t)peppa_pig_text_size);
65     if(NULL == decom_ptr) {
66         cout << "decompress malloc failed" << endl;
67         return -1;
68     }
69 
70     decom_size = LZ4_decompress_safe(com_ptr, decom_ptr, com_size, peppa_pig_text_size);
71     cout << "decompress text size:" << decom_size << endl;
72 
73     // use decompress buf compare with origin buf
74     if(strncmp(peppa_pig_buf, decom_ptr, peppa_pig_text_size)) {
75         cout << "decompress text is not equal peppa pig text" << endl;
76     }
77     
78     free(com_ptr);
79     free(decom_ptr);
80     return 0;
81 }

执行结果:

  从结果可以发现,压缩之前的peppa pig文本长度为1848,压缩后的文本长度为1125(上一篇ZSTD为759),压缩率为1.6,解压后的长度与压缩前相等。相同文本情况下,压缩率低于ZSTD的2.4。从文本被压缩后的长度表现来说,LZ4比ZSTD要差。

  下图图1是LZ4随着acceleration的递增,文本被压缩后的长度与acceleration的关系。随着acceleration的递增,文本被压缩后的长度越来越长。

图1

  图2是LZ4随着acceleration的递增,压缩率与acceleration的关系。随着acceleration的递增,压缩率也越来越低。

 图2

  这是为什么呢?还是上一篇提到的 鱼(性能)和熊掌(压缩比)的关系。获得了压缩的高性能,失去了算法的压缩率。

二、LZ4压缩性能探索

  接下来摸索一下LZ4的压缩性能,以及LZ4在不同acceleration级别下的压缩性能。

  测试方法是,使用LZ4_compress_fast,连续压缩同一段文本并持续10秒。每一次分别使用不同的acceleration级别,最后得到每一种acceleration级别下每秒的平均压缩速率。测试压缩性能的代码示例如下:

 1 #include <stdio.h>
 2 #include <string.h>
 3 #include <sys/time.h>
 4 #include <malloc.h>
 5 #include <lz4.h>
 6 #include <iostream>
 7 
 8 using namespace std;
 9 
10 int main()
11 {
12     char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot \
13     play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, \
14     run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.\
15     Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.\
16     Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy \
17     puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, \
18     George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having \
19     a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, \
20     George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.\
21     Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. \
22     Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.\
23     Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. \
24     Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?\
25     Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. \
26     You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy \
27     puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, \
28     it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, \
29     when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play \
30     in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are \
31     wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping \
32     up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: \
33     It's only mud.";
34 
35     int cnt = 0;
36     
37     size_t com_size;
38     size_t com_space_size;
39     size_t peppa_pig_text_size;
40 
41     timeval st, et;
42     char *com_ptr = NULL;
43 
44     peppa_pig_text_size = strlen(peppa_pig_buf);
45     com_space_size = LZ4_compressBound(peppa_pig_text_size);
46 
47     int test_times = 6;
48     int acceleration = 1;
49     
50     // compress performance test
51     while(test_times >= 1) {
52     
53         gettimeofday(&st, NULL);
54         while(1) {
55         
56             com_ptr = (char *)malloc(com_space_size);
57             if(NULL == com_ptr) {
58                 cout << "compress malloc failed" << endl;
59                 return -1;
60             }
61             
62             com_size = LZ4_compress_fast(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size, acceleration);
63             if(com_size <= 0) {
64                 cout << "compress failed, error code:" << com_size << endl;
65                 free(com_ptr);
66                 return -1;
67             }
68             
69             free(com_ptr);
70         
71             cnt++;
72             gettimeofday(&et, NULL);
73             if(et.tv_sec - st.tv_sec >= 10) {
74                 break;
75             }
76         }
77         
78         cout << "acceleration:" << acceleration << ", compress per second:" << cnt/10 << " times" << endl;
79 
80         ++acceleration;
81         --test_times;
82     }
83 
84     return 0;
85 }

执行结果:

 

  结果可以总结为两点:一是acceleration为默认值1时,即LZ4_compress_default函数的默认值时,每秒的压缩性能在20W+;二是随着acceleration的递增,每秒的压缩性能也在递增,但是代价就是获得更低的压缩率。

  acceleration递增与压缩速率的关系如下图所示:

 图3

三、LZ4解压性能探索

  接下来继续了解一下LZ4的解压性能。

  测试方法是先使用LZ4_compress_fastacceleration = 1压缩文本,再使用安全解压函数LZ4_decompress_safe,连续解压同一段文本并持续10秒,最后得到每秒的平均解压速率。测试解压性能的代码示例如下:

 1 #include <stdio.h>
 2 #include <string.h>
 3 #include <sys/time.h>
 4 #include <malloc.h>
 5 #include <lz4.h>
 6 #include <iostream>
 7 
 8 using namespace std;
 9 
10 int main()
11 {
12     char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot \
13     play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, \
14     run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.\
15     Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.\
16     Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy \
17     puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, \
18     George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having \
19     a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, \
20     George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.\
21     Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. \
22     Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.\
23     Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. \
24     Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?\
25     Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. \
26     You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy \
27     puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, \
28     it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, \
29     when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play \
30     in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are \
31     wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping \
32     up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: \
33     It's only mud.";
34 
35     int cnt = 0;
36     
37     size_t com_size;
38     size_t com_space_size;
39     size_t peppa_pig_text_size;
40 
41     timeval st, et;
42     char *com_ptr = NULL;
43 
44     // compress
45     peppa_pig_text_size = strlen(peppa_pig_buf);
46     com_space_size = LZ4_compressBound(peppa_pig_text_size);
47 
48     com_ptr = (char *)malloc(com_space_size);
49     if(NULL == com_ptr) {
50         cout << "compress malloc failed" << endl;
51         return -1;
52     }
53 
54     com_size = LZ4_compress_fast(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size, 1);
55     if(com_size <= 0) {
56         cout << "compress failed, error code:" << com_size << endl;
57         free(com_ptr);
58         return -1;
59     }
60 
61     // decompress
62     size_t decom_size;
63     char* decom_ptr = NULL;
64     
65     // decompress performance test
66     gettimeofday(&st, NULL);
67     while(1) {
68 
69         decom_ptr = (char *)malloc((size_t)peppa_pig_text_size);
70         if(NULL == decom_ptr) {
71             cout << "decompress malloc failed" << endl;
72             free(com_ptr);
73             return -1;
74         }
75         
76         decom_size = LZ4_decompress_safe(com_ptr, decom_ptr, com_size, peppa_pig_text_size);
77         if(decom_size <= 0) {
78             cout << "decompress failed, error code:" << decom_size << endl;
79             free(com_ptr);
80             free(decom_ptr);
81             return -1;
82         }
83 
84         free(decom_ptr);
85 
86         cnt++;
87         gettimeofday(&et, NULL);
88         if(et.tv_sec - st.tv_sec >= 10) {
89             break;
90         }
91     }
92 
93     free(com_ptr);
94     cout << "decompress per second:" << cnt/10 << " times" << endl;
95     
96     return 0;
97 }

执行结果:

   结果显示LZ4的解压性能大概在每秒54W次左右,解压速率还是非常可观。

四、LZ4对比ZSTD

  使用相同的待压缩文本,分别使用ZSTD与LZ4进行压缩、解压、压缩性能、解压性能测试后有表1的数据。

表1

  

  抛开算法的优劣对比,从实验结果来看,ZSTD更加侧重于压缩率,LZ4(acceleration = 1)更加侧重于压缩性能。

五、总结

  无论任何算法,都很难做到既有高性能压缩的同时,又有特别高的压缩率。两者必须要做一个取舍,或者找到一个合适的平衡点。

  如果在性能可以接受的情况下,选择具有更高压缩率的ZSTD将更加节约存储空间(通过线程池进行多线程压缩可以进一步提升性能);如果对压缩率不是特别看中,追求更高的压缩性能,那LZ4也是一个不错的选择。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/46030.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

router路由的配置和使用(详细教程)

vue路由的原理&#xff1a; 路由就是专门来实现单页面应用的&#xff1b;根据不同的路径&#xff0c;加载不同的组件&#xff1b;路径和组件之间一一映射的关系&#xff1b;路径&#xff0c;组件一一对应&#xff1b;加载这个路径&#xff0c;这个组件就出来了&#xff1b;原理…

第五章. 可视化数据分析分析图表—概念介绍

第五章. 可视化数据分析分析图表 5.1 概念介绍 1.如何选择合适的图标类型 1).图标分类框架示意图&#xff1a; 2.图表的基本组成 1).图表的基本组成部分&#xff1a;画布&#xff0c;图标标题&#xff0c;绘画区&#xff0c;数据系列&#xff0c;坐标轴&#xff0c;坐标轴标题…

publish前自动执行sonarqube

根据SonarQube官方描述&#xff0c;SonarQube由三个组件组成&#xff1a; SonarQube Server&#xff0c;运行如下进程&#xff1a; 一个服务于SonarQube用户界面的web服务器基于Elasticsearch的搜索服务器负责处理代码分析报告并将其保存在SonarQube数据库中的计算引擎 Databa…

多卡聚合通信设备在广电视频传输行业解决方案

1 背景介绍 现场视频回传作为信息量最大、信息表达最直观的一种方式&#xff0c;一直是各家电视台、报社等媒体单位获取素材最理想的方式。由于受技术、成本及基础设施的限制&#xff0c;视频素材的回传的距离、质量一直受到较大影响。而随着4G/5G技术的快速发展&#xff0c;多…

【JAVA案例】判断电话号码运营商

博主&#xff1a;&#x1f44d;不许代码码上红 欢迎&#xff1a;&#x1f40b;点赞、收藏、关注、评论。 格言&#xff1a; 大鹏一日同风起&#xff0c;扶摇直上九万里。 文章目录问题提出&#xff1a;如何判断电话号码属于哪个运营商&#xff1f;一、代码设计思路二、完整源…

Java SPI机制的使用和理解

前言&#xff1a; SPI(Service Provider Interface)&#xff0c;是JDK内置的一种服务提供发现机制&#xff0c;Java中 SPI 机制主要思想是将装配的控制权移到程序之外&#xff0c;在模块化设计中这个机制尤其重要&#xff0c;其核心思想就是解耦 1、大家都知道API&#xff0c;却…

01【高内聚低耦合、Spring概述、IOC容器、Bean的配置方式】

文章目录01【高内聚低耦合、Spring概述、IOC】一、高内聚低耦合1.1 程序架构设计1.2 低耦合1.2.1 耦合概念1.2.2 如何降低耦合1.3 高内聚1.4 不能完全低耦合二、Spring概述2.1 Spring 是什么2.2 Spring出现的背景2.3 Spring包详解三、Spring快速入门3.1 搭建Spring环境3.2 编写…

60 - 数组类模板

---- 整理自狄泰软件唐佐林老师课程 1. 预备知识 模板参数可以是 数值型参数&#xff08;非类型参数&#xff09; 数值型模板参数的 限制 变量不能作为模板参数浮点数不能作为模板参数类对象不能作为模板参数 本质&#xff1a;模板参数是在 编译阶段 被处理的单元&#xff0c…

基于内部模型的鲁棒图像增强

论文题目&#xff1a; ROBUST INTERNAL EXEMPLAR-BASED IMAGE ENHANCEMENT 1 摘要 图像增强的目的是修改图像&#xff0c;以实现更好的人类视觉系统感知或更合适的表示来进一步分析。根据给定输入图像的不同属性&#xff0c;任务也会有所不同&#xff0c;如噪声去除、去模糊、…

jsp三好学生评审管理系统Myeclipse开发mysql数据库web结构java编程计算机网页项目

一、源码特点 JSP 三好学生评审管理系统 是一套完善的web设计系统&#xff0c;对理解JSP java编程开发语言有帮助&#xff0c;系统具有完整的源代码和数据库&#xff0c;系统主要采用B/S模式开发。开发环境为 TOMCAT7.0,Myeclipse8.5开发&#xff0c;数据库为Mysql&#xff0…

R语言highfrequency高频金融数据导入

R中针对高频数据的添加包highfrequency&#xff0c;用于组织高频数据&#xff0c; 高频数据的清理、整理&#xff0c;高频数据的汇总&#xff0c;使用高频数据建立相关模型 都非常方便。但是其中数据输入的过程中&#xff0c;会使用到包里的函数convert()。 最近我们被客户要求…

软件测试行业女生真的没有一席之地了吗,还能入行软件测试吗?

可以&#xff0c;但并不容易。 要比男生面临更多的挑战和付出更多的努力。 首先我强烈反对女生更适合做测试的这种论调: ●女生更为心细&#xff0c;更有耐心&#xff0c;能够更好的找出bug;&#xff0c;测试不用写代码&#xff0c;女生学更容易上手; ●测试强度低&#xff0c;…

发送自定义广播

文章目录发送自定义广播发送标准广播发送有序广播发送自定义广播 发送标准广播 新建一个MyBroadcastReceiver,在onReceive()方法当中编写具体逻辑 class MyBroadcastReceiver : BroadcastReceiver() {override fun onReceive(context: Context, intent: Intent) {Toast.make…

停止员工拖延症!工时管理系统的作用之一

你能想象每天支付给每位员工的工资会损失 27% 吗&#xff1f;这就是最近一项研究发现的正在发生的事。根据 rebootonline.com 的研究&#xff0c;每位员工平均每天要花 122 分钟在拖延上。这意味着这些员工只工作了 73% 的工作日时间&#xff0c;即使他们的工时表另有说明。公司…

出口-汇聚-接入层组网设计

目录 出口组网设计&#xff1a; 汇聚层组网设计 接入层组网设计 出口组网设计&#xff1a; 对于中型的商超、普教场景&#xff0c;网络规模较大&#xff0c;推荐出口采用防火墙双机组网&#xff0c;出口的链路推荐多运营商链路备份。出口网关设备需要部署的主要功能&#xf…

高等数学(第七版)同济大学 习题10-3 (前9题)个人解答

高等数学&#xff08;第七版&#xff09;同济大学 习题10-3&#xff08;前9题&#xff09; 函数作图软件&#xff1a;Mathematica 1.化三重积分I∭Ωf(x,y,z)dxdydz为三次积分&#xff0c;其中积分区域Ω分别是\begin{aligned}&1. \ 化三重积分I\iiint_{\Omega}f(x, \ y, …

Git常用操作

目录一、前言二、查看变更及历史信息2.1 查看变更状态1、显示本地仓库有变更的文件2.2 查看历史记录1、查看历史提交记录2、查看历史某个文件的提交记录三、撤销提交3.1 撤销工作区的修改3.2 撤销暂存区的提交3.3 撤销已经提交到本地仓库的代码四、冲突解决4.1 远程仓库中有新增…

iText7高级教程之html2pdf——6.在pdfHTML中使用字体

到目前为止&#xff0c;我们还没有花太多的精力来研究将HTML转换为PDF时使用的字体。我们知道Helvetica是iText在没有指定字体时使用的默认字体&#xff08;第2章&#xff09;&#xff0c;我们知道如果需要嵌入字体&#xff0c;pdfHTML会附带一些内置字体&#xff08;第4章&…

线下商家卖货难、拓客难、引流难,不如学习一下怎么结合O2O电商

大家好&#xff0c;我是阿璋&#xff0c;互联网行业中一直流传着这样的一句话&#xff0c;说的是“站在风口上&#xff0c;猪都能飞起来”。这句话的意思是&#xff0c;如果跟上了时代的发展&#xff0c;并且在时代发展的关键档口&#xff0c;那么在时代的推动下&#xff0c;也…

关系抽取(一)

关系抽取从流程上&#xff0c;可以分为流水线式抽取&#xff08;Pipline&#xff09;和联合抽取&#xff08;Joint Extraction&#xff09;两种&#xff0c;流水线式抽取就是把关系抽取的任务分为两个步骤&#xff1a;首先做实体识别&#xff0c;再抽取出两个实体的关系&#x…