Snippets
Recommended
For temporary storage:
zstd -z -3 -q --rsyncable
For archival storage:
lzip -6 -b 1048576
For compatibility with "good enough" compression while saving on CPU:
gzip -3 --rsyncable
For compatibility with "almost the best" compression (that gzip
is capable of):
gzip -6 --rsyncable
Not recommended
All the snippets below have the following properties:
- have the "best" compression ratio;
- and at the same time, have the "worst" CPU and memory performance;
bzip2 -z -9
zstd -z -9 -q
lzip -9 -b 1048576
xz -z -3 -q -F xz -C sha256
Context
Throughout my career, when in my operations role, I was faced many times with the following simple task:
Given a large file, comprised of "textual" data, compress it with the "best" tool prior to archival.
By "textual" data I mean something like the following:
- an SQL "dump" from a database;
- a "log" file from a service (perhaps collected via
syslog
); - a JSON (stream) with various raw "metric" values;
And unfortunately by "best" I mean contradicting requirements like:
- the "largest" compression ratio;
- the "fastest" compression CPU time;
- the "lowest" compression resource consumption;
- the "fastest" decompression time; (although not as important as the others;)
So far I have used the following tools (usually at their -9
level):
- long time ago I have used
bzip2
; - then I discovered
lzma
, which then becamexz
; (my current default, although I'm not so sure anymore;) - then I discovered
lzip
, which (although similar withlzma
/xz
) had a file-format specifically designed for archival and recovery; - lately I've discovered
zstd
; - and sometimes I use
gzip --rsyncable
for Git storage;
However I have never done a thorough benchmark based on different use-cases, until today... :)
Benchmarking
Benchmarking scenarios
I have applied the above snippets (with various compression levels) on the following real-world scenarios:
mysql-01.sql
-- a MySQL CMS database; (~40 MiB)mysql-02.sql
-- a MySQL application database; (~600 MiB)mysql-03.sql
-- a MySQL document store database; (~5 GiB)json-01.json
-- a JSON structured log with two types of mixed records; (~600 MiB)json-02.json
-- a JSON structured ClickHouse table data; (~5 GiB)bgp-mrt.tsv
-- a TSV file containing the entire BGP routing table; (~5 GiB)
Benchmarking conclusions
After a quick glance at the numbers I would say that:
- in terms of compression, I see no reason why not to use
lzip
overxz
; - in terms of CPU time, I see no reason to use
-9
; it consumes lots of memory, and many times more CPU, with little to no gains; - for
xz
andlzip
my default level will be-3
; because it strikes the best compression ratio / CPU / memory balance; - if CPU time is essential (like for example running on a busy production server), then
zstd
with-3
yields fast compression while still maintaining good compression rates;
As such my new choices, depending on the use-case are:
- for long term archival --
lzip -6
; - for temporary storage efficiency --
zstd -3
;
Benchmarking results
About the data
Below are the tables with the actual benchmarking data, whose columns mean:
size
(in MiB) -- the size of the original or compressed file;time
(in seconds) -- the total CPU time (user-space and kernel); (i.e. not "wall-clock-time";)memory
(in MiB) -- the amount of actual memory consumed;comp%
-- the percentage of storage space saved in percentage; (i.e. the "compression ratio";)comp
(in MiB) -- the actual amount of storage space saved;comp/s
(in MiB/s) -- the amount of MiB's saved per second of CPU time;diff%
-- how "bad" is the current tool (and level) compared with the "best" tool in terms of compression;
Note that:
diff%
is "relative", in the sense that a tool that is-25%
"worse" might actually be off with only a few MiB;- anything with
diff%
in the range0
to-5%
I would say is "good enough"; - pay special attention to the
comp/s
(compression speed) and thememory
columns, as these impact the load on your server;
mysql-01.sql
method | size | time | memory | comp% | comp/s | comp | diff%
none | 37 | ~~ | ~~ | ~~ | ~~ | 35 | ~~
gzip:1 | 11 | 0.5 | 1,628 | 70.3% | 49.1 | 26 | -25.7%
gzip:3 | 10 | 0.7 | 1,748 | 73.0% | 39.1 | 27 | -22.9%
gzip:6 | 9 | 1.4 | 1,632 | 75.7% | 20.7 | 28 | -20.0%
gzip:9 | 9 | 1.8 | 1,620 | 75.7% | 15.9 | 28 | -20.0%
bzip2:1 | 8 | 2.6 | 2,488 | 78.4% | 11.0 | 29 | -17.1%
bzip2:3 | 7 | 2.7 | 3,808 | 81.1% | 11.0 | 30 | -14.3%
bzip2:6 | 7 | 2.9 | 5,492 | 81.1% | 10.4 | 30 | -14.3%
bzip2:9 | 6 | 3.0 | 7,900 | 83.8% | 10.4 | 31 | -11.4%
zstd:1 | 9 | 0.2 | 11,668 | 75.7% | 140.0 | 28 | -20.0%
zstd:3 | 8 | 0.2 | 37,976 | 78.4% | 120.8 | 29 | -17.1%
zstd:6 | 7 | 0.5 | 41,160 | 81.1% | 63.8 | 30 | -14.3%
zstd:9 | 7 | 1.0 | 43,192 | 81.1% | 30.0 | 30 | -14.3%
lzip:0 | 9 | 0.9 | 3,996 | 75.7% | 31.1 | 28 | -20.0%
lzip:1 | 8 | 2.2 | 14,488 | 78.4% | 13.1 | 29 | -17.1%
lzip:3 | 7 | 4.6 | 25,752 | 81.1% | 6.5 | 30 | -14.3%
lzip:6 | 5 | 14.8 | 93,244 | 86.5% | 2.2 | 32 | -8.6%
lzip:9 | 5 | 22.7 | 150,836 | 86.5% | 1.4 | 32 | -8.6%
xz:0 | 9 | 1.6 | 4,532 | 75.7% | 17.7 | 28 | -20.0%
xz:1 | 7 | 1.8 | 10,624 | 81.1% | 17.1 | 30 | -14.3%
xz:3 | 6 | 4.3 | 33,480 | 83.8% | 7.2 | 31 | -11.4%
xz:6 | 4 | 14.2 | 97,332 | 89.2% | 2.3 | 33 | -5.7%
xz:9 | 2 | 13.6 | 398,956 | 94.6% | 2.6 | 35 | 0.0%
mysql-02.sql
method | size | time | memory | comp% | comp/s | comp | diff%
none | 663 | ~~ | ~~ | ~~ | ~~ | 617 | ~~
gzip:1 | 101 | 6.9 | 1,644 | 84.8% | 81.4 | 562 | -8.9%
gzip:3 | 93 | 7.1 | 1,736 | 86.0% | 79.9 | 570 | -7.6%
gzip:6 | 76 | 10.5 | 1,660 | 88.5% | 56.1 | 587 | -4.9%
gzip:9 | 74 | 18.7 | 1,628 | 88.8% | 31.5 | 589 | -4.5%
bzip2:1 | 67 | 54.4 | 2,488 | 89.9% | 11.0 | 596 | -3.4%
bzip2:3 | 56 | 62.9 | 3,724 | 91.6% | 9.7 | 607 | -1.6%
bzip2:6 | 51 | 71.4 | 5,492 | 92.3% | 8.6 | 612 | -0.8%
bzip2:9 | 49 | 77.1 | 7,852 | 92.6% | 8.0 | 614 | -0.5%
zstd:1 | 75 | 2.7 | 10,920 | 88.7% | 221.9 | 588 | -4.7%
zstd:3 | 72 | 3.2 | 36,832 | 89.1% | 182.4 | 591 | -4.2%
zstd:6 | 64 | 5.9 | 41,152 | 90.3% | 101.2 | 599 | -2.9%
zstd:9 | 59 | 10.2 | 43,144 | 91.1% | 59.5 | 604 | -2.1%
lzip:0 | 73 | 10.4 | 3,936 | 89.0% | 56.8 | 590 | -4.4%
lzip:1 | 74 | 30.5 | 14,484 | 88.8% | 19.3 | 589 | -4.5%
lzip:3 | 67 | 46.4 | 25,720 | 89.9% | 12.9 | 596 | -3.4%
lzip:6 | 52 | 122.6 | 93,356 | 92.2% | 5.0 | 611 | -1.0%
lzip:9 | 46 | 528.8 | 248,024 | 93.1% | 1.2 | 617 | 0.0%
xz:0 | 70 | 16.2 | 4,600 | 89.4% | 36.7 | 593 | -3.9%
xz:1 | 63 | 19.3 | 10,620 | 90.5% | 31.1 | 600 | -2.8%
xz:3 | 58 | 36.8 | 33,612 | 91.3% | 16.5 | 605 | -1.9%
xz:6 | 49 | 135.1 | 97,324 | 92.6% | 4.5 | 614 | -0.5%
xz:9 | 48 | 194.5 | 691,240 | 92.8% | 3.2 | 615 | -0.3%
mysql-03.sql
method | size | time | memory | comp% | comp/s | comp | diff%
none | 5,228 | ~~ | ~~ | ~~ | ~~ | 5,007 | ~~
gzip:1 | 1,308 | 68.1 | 1,576 | 75.0% | 57.6 | 3,920 | -21.7%
gzip:3 | 1,168 | 81.7 | 1,568 | 77.7% | 49.7 | 4,060 | -18.9%
gzip:6 | 934 | 152.0 | 1,564 | 82.1% | 28.3 | 4,294 | -14.2%
gzip:9 | 924 | 212.3 | 1,632 | 82.3% | 20.3 | 4,304 | -14.0%
bzip2:1 | 922 | 401.0 | 2,392 | 82.4% | 10.7 | 4,306 | -14.0%
bzip2:3 | 653 | 434.3 | 4,008 | 87.5% | 10.5 | 4,575 | -8.6%
bzip2:6 | 524 | 484.7 | 5,496 | 90.0% | 9.7 | 4,704 | -6.1%
bzip2:9 | 462 | 521.4 | 7,608 | 91.2% | 9.1 | 4,766 | -4.8%
zstd:1 | 528 | 20.4 | 11,748 | 89.9% | 230.8 | 4,700 | -6.1%
zstd:3 | 364 | 22.0 | 37,628 | 93.0% | 221.3 | 4,864 | -2.9%
zstd:6 | 332 | 46.5 | 42,584 | 93.6% | 105.2 | 4,896 | -2.2%
zstd:9 | 300 | 72.0 | 44,584 | 94.3% | 68.5 | 4,928 | -1.6%
lzip:0 | 762 | 96.1 | 3,992 | 85.4% | 46.5 | 4,466 | -10.8%
lzip:1 | 439 | 261.1 | 14,484 | 91.6% | 18.3 | 4,789 | -4.4%
lzip:3 | 323 | 412.4 | 25,764 | 93.8% | 11.9 | 4,905 | -2.0%
lzip:6 | 248 | 829.6 | 93,244 | 95.3% | 6.0 | 4,980 | -0.5%
lzip:9 | 224 | 2,443.4 | 363,656 | 95.7% | 2.0 | 5,004 | -0.1%
xz:0 | 554 | 125.8 | 4,652 | 89.4% | 37.2 | 4,674 | -6.7%
xz:1 | 365 | 130.0 | 10,672 | 93.0% | 37.4 | 4,863 | -2.9%
xz:3 | 282 | 260.6 | 33,480 | 94.6% | 19.0 | 4,946 | -1.2%
xz:6 | 233 | 894.6 | 97,204 | 95.5% | 5.6 | 4,995 | -0.2%
xz:9 | 221 | 1,156.9 | 691,244 | 95.8% | 4.3 | 5,007 | 0.0%
json-01.json
method | size | time | memory | comp% | comp/s | comp | diff%
none | 562 | ~~ | ~~ | ~~ | ~~ | 524 | ~~
gzip:1 | 163 | 8.6 | 1,636 | 71.0% | 46.3 | 399 | -23.9%
gzip:3 | 157 | 8.8 | 1,616 | 72.1% | 46.0 | 405 | -22.7%
gzip:6 | 131 | 12.4 | 1,592 | 76.7% | 34.8 | 431 | -17.7%
gzip:9 | 131 | 14.2 | 1,744 | 76.7% | 30.4 | 431 | -17.7%
bzip2:1 | 139 | 40.7 | 2,432 | 75.3% | 10.4 | 423 | -19.3%
bzip2:3 | 103 | 43.0 | 3,636 | 81.7% | 10.7 | 459 | -12.4%
bzip2:6 | 87 | 46.7 | 5,492 | 84.5% | 10.2 | 475 | -9.4%
bzip2:9 | 80 | 49.3 | 7,608 | 85.8% | 9.8 | 482 | -8.0%
zstd:1 | 99 | 2.4 | 10,860 | 82.4% | 195.4 | 463 | -11.6%
zstd:3 | 83 | 3.1 | 36,972 | 85.2% | 156.5 | 479 | -8.6%
zstd:6 | 76 | 6.2 | 40,240 | 86.5% | 78.8 | 486 | -7.3%
zstd:9 | 70 | 11.1 | 41,168 | 87.5% | 44.3 | 492 | -6.1%
lzip:0 | 110 | 13.4 | 4,060 | 80.4% | 33.9 | 452 | -13.7%
lzip:1 | 84 | 31.4 | 14,448 | 85.1% | 15.2 | 478 | -8.8%
lzip:3 | 74 | 48.0 | 25,712 | 86.8% | 10.2 | 488 | -6.9%
lzip:6 | 53 | 112.7 | 93,304 | 90.6% | 4.5 | 509 | -2.9%
lzip:9 | 47 | 344.5 | 206,340 | 91.6% | 1.5 | 515 | -1.7%
xz:0 | 93 | 19.1 | 4,604 | 83.5% | 24.5 | 469 | -10.5%
xz:1 | 76 | 24.0 | 10,484 | 86.5% | 20.3 | 486 | -7.3%
xz:3 | 65 | 53.9 | 33,540 | 88.4% | 9.2 | 497 | -5.2%
xz:6 | 40 | 131.0 | 97,016 | 92.9% | 4.0 | 522 | -0.4%
xz:9 | 38 | 179.3 | 690,756 | 93.2% | 2.9 | 524 | 0.0%
json-02.json
method | size | time | memory | comp% | comp/s | comp | diff%
none | 5,006 | ~~ | ~~ | ~~ | ~~ | 4,860 | ~~
gzip:1 | 597 | 44.8 | 1,640 | 88.1% | 98.5 | 4,409 | -9.3%
gzip:3 | 496 | 45.2 | 1,556 | 90.1% | 99.8 | 4,510 | -7.2%
gzip:6 | 371 | 59.7 | 1,620 | 92.6% | 77.7 | 4,635 | -4.6%
gzip:9 | 352 | 83.8 | 1,548 | 93.0% | 55.5 | 4,654 | -4.2%
bzip2:1 | 333 | 455.5 | 2,136 | 93.3% | 10.3 | 4,673 | -3.8%
bzip2:3 | 256 | 551.3 | 3,644 | 94.9% | 8.6 | 4,750 | -2.3%
bzip2:6 | 227 | 639.2 | 5,572 | 95.5% | 7.5 | 4,779 | -1.7%
bzip2:9 | 214 | 701.6 | 7,660 | 95.7% | 6.8 | 4,792 | -1.4%
zstd:1 | 306 | 17.8 | 10,584 | 93.9% | 263.9 | 4,700 | -3.3%
zstd:3 | 300 | 20.5 | 36,152 | 94.0% | 229.8 | 4,706 | -3.2%
zstd:6 | 273 | 35.5 | 40,092 | 94.5% | 133.4 | 4,733 | -2.6%
zstd:9 | 237 | 57.4 | 41,592 | 95.3% | 83.1 | 4,769 | -1.9%
lzip:0 | 339 | 62.2 | 3,912 | 93.2% | 75.0 | 4,667 | -4.0%
lzip:1 | 349 | 186.6 | 14,548 | 93.0% | 25.0 | 4,657 | -4.2%
lzip:3 | 304 | 262.1 | 25,764 | 93.9% | 17.9 | 4,702 | -3.3%
lzip:6 | 216 | 730.4 | 93,492 | 95.7% | 6.6 | 4,790 | -1.4%
lzip:9 | 157 | 2,802.4 | 363,680 | 96.9% | 1.7 | 4,849 | -0.2%
xz:0 | 331 | 91.2 | 4,536 | 93.4% | 51.3 | 4,675 | -3.8%
xz:1 | 283 | 100.8 | 10,604 | 94.3% | 46.9 | 4,723 | -2.8%
xz:3 | 251 | 139.8 | 33,476 | 95.0% | 34.0 | 4,755 | -2.2%
xz:6 | 186 | 674.5 | 97,132 | 96.3% | 7.1 | 4,820 | -0.8%
xz:9 | 146 | 814.6 | 691,032 | 97.1% | 6.0 | 4,860 | 0.0%
bgp-mrt.tsv
method | size | time | memory | comp% | comp/s | comp | diff%
none | 37 | ~~ | ~~ | ~~ | ~~ | 35 | ~~
gzip:1 | 11 | 0.5 | 1,628 | 70.3% | 49.1 | 26 | -25.7%
gzip:3 | 10 | 0.7 | 1,748 | 73.0% | 39.1 | 27 | -22.9%
gzip:6 | 9 | 1.4 | 1,632 | 75.7% | 20.7 | 28 | -20.0%
gzip:9 | 9 | 1.8 | 1,620 | 75.7% | 15.9 | 28 | -20.0%
bzip2:1 | 8 | 2.6 | 2,488 | 78.4% | 11.0 | 29 | -17.1%
bzip2:3 | 7 | 2.7 | 3,808 | 81.1% | 11.0 | 30 | -14.3%
bzip2:6 | 7 | 2.9 | 5,492 | 81.1% | 10.4 | 30 | -14.3%
bzip2:9 | 6 | 3.0 | 7,900 | 83.8% | 10.4 | 31 | -11.4%
zstd:1 | 9 | 0.2 | 11,668 | 75.7% | 140.0 | 28 | -20.0%
zstd:3 | 8 | 0.2 | 37,976 | 78.4% | 120.8 | 29 | -17.1%
zstd:6 | 7 | 0.5 | 41,160 | 81.1% | 63.8 | 30 | -14.3%
zstd:9 | 7 | 1.0 | 43,192 | 81.1% | 30.0 | 30 | -14.3%
lzip:0 | 9 | 0.9 | 3,996 | 75.7% | 31.1 | 28 | -20.0%
lzip:1 | 8 | 2.2 | 14,488 | 78.4% | 13.1 | 29 | -17.1%
lzip:3 | 7 | 4.6 | 25,752 | 81.1% | 6.5 | 30 | -14.3%
lzip:6 | 5 | 14.8 | 93,244 | 86.5% | 2.2 | 32 | -8.6%
lzip:9 | 5 | 22.7 | 150,836 | 86.5% | 1.4 | 32 | -8.6%
xz:0 | 9 | 1.6 | 4,532 | 75.7% | 17.7 | 28 | -20.0%
xz:1 | 7 | 1.8 | 10,624 | 81.1% | 17.1 | 30 | -14.3%
xz:3 | 6 | 4.3 | 33,480 | 83.8% | 7.2 | 31 | -11.4%
xz:6 | 4 | 14.2 | 97,332 | 89.2% | 2.3 | 33 | -5.7%
xz:9 | 2 | 13.6 | 398,956 | 94.6% | 2.6 | 35 | 0.0%
Benchmarking snippets
For the commands used in running the benchmark, see commands.txt.