一、LZ4压缩与解压
LZ4有两个压缩函数。默认压缩函数原型:
int LZ4_compress_default(const char* src, char* dst, int srcSize, int dstCapacity);
快速压缩函数原型:
int LZ4_compress_fast (const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);
快速压缩函数acceleration的参数范围:[1 ~ LZ4_ACCELERATION_MAX],其中LZ4_ACCELERATION_MAX为65537。什么意思呢,简单的说就是acceleration值越大,压缩速率越快,但是压缩比就越低,后面我会用实验数据来进行说明。
另外,当acceleration = 1时,就是简化版的LZ4_compress_default,LZ4_compress_default函数默认acceleration = 1。
LZ4也有两个解缩函数。安全解缩函数原型:
int LZ4_decompress_safe (const char* src, char* dst, int compressedSize, int dstCapacity);
快速解缩函数原型:
int LZ4_decompress_fast (const char* src, char* dst, int originalSize);
快速解压函数不建议使用。因为LZ4_decompress_fast 缺少被压缩后的文本长度参数,被认为是不安全的,LZ4建议使用LZ4_decompress_safe。
同样,我们先来看看LZ4的压缩与解压缩示例。
1 #include <stdio.h> 2 #include <string.h> 3 #include <sys/time.h> 4 #include <malloc.h> 5 #include <lz4.h> 6 #include <iostream> 7 8 using namespace std; 9 10 int main() 11 { 12 char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot \ 13 play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, \ 14 run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.\ 15 Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.\ 16 Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy \ 17 puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, \ 18 George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having \ 19 a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, \ 20 George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.\ 21 Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. \ 22 Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.\ 23 Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. \ 24 Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?\ 25 Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. \ 26 You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy \ 27 puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, \ 28 it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, \ 29 when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play \ 30 in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are \ 31 wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping \ 32 up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: \ 33 It's only mud."; 34 35 size_t com_space_size; 36 size_t peppa_pig_text_size; 37 38 char *com_ptr = NULL; 39 40 // compress 41 peppa_pig_text_size = strlen(peppa_pig_buf); 42 com_space_size = LZ4_compressBound(peppa_pig_text_size); 43 44 com_ptr = (char *)malloc(com_space_size); 45 if(NULL == com_ptr) { 46 cout << "compress malloc failed" << endl; 47 return -1; 48 } 49 50 memset(com_ptr, 0, com_space_size); 51 52 size_t com_size; 53 //com_size = LZ4_compress_default(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size); 54 com_size = LZ4_compress_fast(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size, 1); 55 cout << "peppa pig text size:" << peppa_pig_text_size << endl; 56 cout << "compress text size:" << com_size << endl; 57 cout << "compress ratio:" << (float)peppa_pig_text_size / (float)com_size << endl << endl; 58 59 60 // decompress 61 size_t decom_size; 62 char* decom_ptr = NULL; 63 64 decom_ptr = (char *)malloc((size_t)peppa_pig_text_size); 65 if(NULL == decom_ptr) { 66 cout << "decompress malloc failed" << endl; 67 return -1; 68 } 69 70 decom_size = LZ4_decompress_safe(com_ptr, decom_ptr, com_size, peppa_pig_text_size); 71 cout << "decompress text size:" << decom_size << endl; 72 73 // use decompress buf compare with origin buf 74 if(strncmp(peppa_pig_buf, decom_ptr, peppa_pig_text_size)) { 75 cout << "decompress text is not equal peppa pig text" << endl; 76 } 77 78 free(com_ptr); 79 free(decom_ptr); 80 return 0; 81 }
执行结果:
从结果可以发现,压缩之前的peppa pig文本长度为1848,压缩后的文本长度为1125(上一篇ZSTD为759),压缩率为1.6,解压后的长度与压缩前相等。相同文本情况下,压缩率低于ZSTD的2.4。从文本被压缩后的长度表现来说,LZ4比ZSTD要差。
下图图1是LZ4随着acceleration的递增,文本被压缩后的长度与acceleration的关系。随着acceleration的递增,文本被压缩后的长度越来越长。
图1
图2是LZ4随着acceleration的递增,压缩率与acceleration的关系。随着acceleration的递增,压缩率也越来越低。
图2
这是为什么呢?还是上一篇提到的 鱼(性能)和熊掌(压缩比)的关系。获得了压缩的高性能,失去了算法的压缩率。
二、LZ4压缩性能探索
接下来摸索一下LZ4的压缩性能,以及LZ4在不同acceleration级别下的压缩性能。
测试方法是,使用LZ4_compress_fast,连续压缩同一段文本并持续10秒。每一次分别使用不同的acceleration级别,最后得到每一种acceleration级别下每秒的平均压缩速率。测试压缩性能的代码示例如下:
1 #include <stdio.h> 2 #include <string.h> 3 #include <sys/time.h> 4 #include <malloc.h> 5 #include <lz4.h> 6 #include <iostream> 7 8 using namespace std; 9 10 int main() 11 { 12 char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot \ 13 play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, \ 14 run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.\ 15 Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.\ 16 Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy \ 17 puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, \ 18 George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having \ 19 a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, \ 20 George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.\ 21 Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. \ 22 Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.\ 23 Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. \ 24 Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?\ 25 Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. \ 26 You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy \ 27 puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, \ 28 it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, \ 29 when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play \ 30 in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are \ 31 wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping \ 32 up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: \ 33 It's only mud."; 34 35 int cnt = 0; 36 37 size_t com_size; 38 size_t com_space_size; 39 size_t peppa_pig_text_size; 40 41 timeval st, et; 42 char *com_ptr = NULL; 43 44 peppa_pig_text_size = strlen(peppa_pig_buf); 45 com_space_size = LZ4_compressBound(peppa_pig_text_size); 46 47 int test_times = 6; 48 int acceleration = 1; 49 50 // compress performance test 51 while(test_times >= 1) { 52 53 gettimeofday(&st, NULL); 54 while(1) { 55 56 com_ptr = (char *)malloc(com_space_size); 57 if(NULL == com_ptr) { 58 cout << "compress malloc failed" << endl; 59 return -1; 60 } 61 62 com_size = LZ4_compress_fast(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size, acceleration); 63 if(com_size <= 0) { 64 cout << "compress failed, error code:" << com_size << endl; 65 free(com_ptr); 66 return -1; 67 } 68 69 free(com_ptr); 70 71 cnt++; 72 gettimeofday(&et, NULL); 73 if(et.tv_sec - st.tv_sec >= 10) { 74 break; 75 } 76 } 77 78 cout << "acceleration:" << acceleration << ", compress per second:" << cnt/10 << " times" << endl; 79 80 ++acceleration; 81 --test_times; 82 } 83 84 return 0; 85 }
执行结果:
结果可以总结为两点:一是acceleration为默认值1时,即LZ4_compress_default函数的默认值时,每秒的压缩性能在20W+;二是随着acceleration的递增,每秒的压缩性能也在递增,但是代价就是获得更低的压缩率。
acceleration递增与压缩速率的关系如下图所示:
图3
三、LZ4解压性能探索
接下来继续了解一下LZ4的解压性能。
测试方法是先使用LZ4_compress_fast,acceleration = 1压缩文本,再使用安全解压函数LZ4_decompress_safe,连续解压同一段文本并持续10秒,最后得到每秒的平均解压速率。测试解压性能的代码示例如下:
1 #include <stdio.h> 2 #include <string.h> 3 #include <sys/time.h> 4 #include <malloc.h> 5 #include <lz4.h> 6 #include <iostream> 7 8 using namespace std; 9 10 int main() 11 { 12 char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot \ 13 play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, \ 14 run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.\ 15 Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.\ 16 Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy \ 17 puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, \ 18 George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having \ 19 a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, \ 20 George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.\ 21 Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. \ 22 Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.\ 23 Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. \ 24 Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?\ 25 Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. \ 26 You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy \ 27 puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, \ 28 it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, \ 29 when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play \ 30 in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are \ 31 wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping \ 32 up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: \ 33 It's only mud."; 34 35 int cnt = 0; 36 37 size_t com_size; 38 size_t com_space_size; 39 size_t peppa_pig_text_size; 40 41 timeval st, et; 42 char *com_ptr = NULL; 43 44 // compress 45 peppa_pig_text_size = strlen(peppa_pig_buf); 46 com_space_size = LZ4_compressBound(peppa_pig_text_size); 47 48 com_ptr = (char *)malloc(com_space_size); 49 if(NULL == com_ptr) { 50 cout << "compress malloc failed" << endl; 51 return -1; 52 } 53 54 com_size = LZ4_compress_fast(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size, 1); 55 if(com_size <= 0) { 56 cout << "compress failed, error code:" << com_size << endl; 57 free(com_ptr); 58 return -1; 59 } 60 61 // decompress 62 size_t decom_size; 63 char* decom_ptr = NULL; 64 65 // decompress performance test 66 gettimeofday(&st, NULL); 67 while(1) { 68 69 decom_ptr = (char *)malloc((size_t)peppa_pig_text_size); 70 if(NULL == decom_ptr) { 71 cout << "decompress malloc failed" << endl; 72 free(com_ptr); 73 return -1; 74 } 75 76 decom_size = LZ4_decompress_safe(com_ptr, decom_ptr, com_size, peppa_pig_text_size); 77 if(decom_size <= 0) { 78 cout << "decompress failed, error code:" << decom_size << endl; 79 free(com_ptr); 80 free(decom_ptr); 81 return -1; 82 } 83 84 free(decom_ptr); 85 86 cnt++; 87 gettimeofday(&et, NULL); 88 if(et.tv_sec - st.tv_sec >= 10) { 89 break; 90 } 91 } 92 93 free(com_ptr); 94 cout << "decompress per second:" << cnt/10 << " times" << endl; 95 96 return 0; 97 }
执行结果:
结果显示LZ4的解压性能大概在每秒54W次左右,解压速率还是非常可观。
四、LZ4对比ZSTD
使用相同的待压缩文本,分别使用ZSTD与LZ4进行压缩、解压、压缩性能、解压性能测试后有表1的数据。
表1
抛开算法的优劣对比,从实验结果来看,ZSTD更加侧重于压缩率,LZ4(acceleration = 1)更加侧重于压缩性能。
五、总结
无论任何算法,都很难做到既有高性能压缩的同时,又有特别高的压缩率。两者必须要做一个取舍,或者找到一个合适的平衡点。
如果在性能可以接受的情况下,选择具有更高压缩率的ZSTD将更加节约存储空间(通过线程池进行多线程压缩可以进一步提升性能);如果对压缩率不是特别看中,追求更高的压缩性能,那LZ4也是一个不错的选择。