You’ve probably heard of Zlib, but even if you haven’t, you’ve almost certainly used it.
Zlib’s unashamedly 1990s-style website describes the product as A Massively Spiffy Yet Delicately Unobtrusive Compression Library (Also Free, Not to Mention Unencumbered by Patents).
Data compression software (and, of course, the matching code to decompress it later) has always been handy to have around, as anyone who has ever used software such as PKZIP, WinRAR, 7-Zip and any of a great number of archiving tools will attest.
As you can imagine, the primary purpose of data compression is to save space, such as reducing the storage capacity needed for backups or cutting down on the bandwidth used for data transfer.
Often, however, despite the computational overhead added for squashing and expanding the data before and after storing or sending it, compression saves time as well as space, simply by reducing the amount of data that needs to be shuffled back and forth between a fast storage location such as RAM (memory) and a slow one such as disk, tape or network.
An internet standard
Zlib’s compression algorithm is known simply as DEFLATE
.
Although the algorithm was originally part of a patent for use in the commercial compression software PKZIP, it was adopted as an internet standard in RFC 1951 DEFLATE Compressed Data Format Specification version 1.3, published in 1996.
RFC 1951 notes that, despite the patent on PKZIP’s version of the process, “[t]he format can be implemented readily in a manner not covered by patents.”
Ironically, perhaps, Zlib and its companion tool gzip
(which compresses individual files using the Zlib algorithm) are probably best-known to users of Unix/Linux, because they were introduced as a replacement for the once-ubiquitous Unix tool compress
, which performed a similar function with a similar sort of algorithm.
But fears over compress
getting caught up in patent arguments meant that the Unix community sought an unencumbered equivalent that wouldn’t suddenly vanish from use due to legal arguments over its provenance.
Zlib and gzip
, based on the known-and-already-trusted DEFLATE
algorithm, were the answer to this dilemma, and given that Zlib compressed better and faster, as well as being legally unencumbered, meant that it quickly took over the role of compress
.
Zlib still very widely used
Interestingly, numerous modern compression schemes provide significantly better compression, primarily by using lots more memory to keep track of repeated text (e.g. 7z for arbitrary data), or by equipping each end with a substantial dictionary of oft-repeated text strings (e.g. Brotli for compressing web pages).
But Zlib’s DEFLATE
is nevertheless still the most likely compression scheme you’ll find, because it’s been so widely used for so long, and is therefore “built in” to numerous widespread file formats.
For example, Office documents such as DOCX
and XLSX
files are stored in ZIP format, using the DEFLATE
algorithm to save space; so are Java JAR
files and Android APK
files; and PNG
graphics files use Zlib compression internally.
Likewise, a huge number of web pages – probably the majority of pages you’ll visit – are compressed for transmission using Zlib, even though Google’s Brotli algorithm, which usually compresses better, is now widely supported by web servers and browsers alike.
What this means is that many apps you use regularly will include code not only to decompress Zlib data when reading it in, but also to compress to Zlib format when saving or sending data, because DEFLATE
is a sort of lingua franca for compressed data.
And any of these apps – perhaps even most of them, if you’re on Linux or Unix – will use the Zlib implementation of DEFLATE
, often in the form of a shared library (a DLL on Windows) called libz.so
or LIBZ.DLL
.
In a quick test on our Linux laptop, we found that the following apps all loaded the libz.so
shared code into memory, ready for use if needed: the browsers Firefox, Edge, Chromium and Tor; the PDF reader Xpdf; the popular media player VLC; the Word-and-Excel compatible software LibreOffice; the image editor GIMP; and even our favourite HP desktop calculator emulator, Free42.
Surely no bugs left?
With a legacy that long, and with an algorithm that was locked down as an internet standard back in 1996, you’d no doubt assume that Zlib had very few bugs left, and that any serious ones, such as those leading to the sort of memory corruption that could be exploitable for remote code execution (RCE), would have been found by now.
After all, memory corruption errors, when triggered by mistake by randomly weird input, typically give themselves away by causing the app concerned to crash, which in turn typically leaves behind a telltale error trace saying which part of the program provoked the crashworthy behaviour.
And with all sorts of data getting compressed and decompressed all the time, often without you realising it (e.g. between a web server and your browser, where Zlib typically squashes and unsquashes the data invisibly for transfer)…
…you’d expect that even the most esoteric memory mismanagement bugs would be outed in the end, if only by chance.
Well, Google bug-hunter Tavis Ormandy, who has uncovered some truly funky bugs in his storied bug-hunting career, just found a curious and possibly, just possibly, exploitable bug in the Zlib code…
…only to discover, when he came to report it, that it had been reported and patched before, back in 2018, when it was wryly noted as being 13 years old:
This bug was reported by Danilo Ramos of Eideticom, Inc. It has lain in wait 13 years before being found! [2018-04-20]
Somehow, though, the patch committed back on 20 April 2018 never made it into an update to Zlib, for the simple reason that until 2022-03-27, the previous release of the library was published on 2017-01-15, more than five years ago.
As Ormandy noted in his bug report:
Greetings list, I was recently trying to track down a reproducible crash in a compressor. Believe it or not, it really was a bug in zlib-1.2.11 when compressing (not decompressing!) certain inputs.
I reported it upstream, but it turns out the issue has been public since 2018, but the patch never made it into a release. As far as I know, nobody ever assigned it a CVE.
Ormandy’s surprise above is not just that the bug inadvertently went unfixed for all that time, but also that he’d discovered a potentially exploitable buffer overflow in the compression part of the code, rather than the decompressor.
Memory mismanagement can happen anywhere that a programmer is careless with which data gets written where, but in compression software it’s much more common to find this sort of bug in code that expands data from its compressed format, most notably because you can’t reliably determine how much memory space you’ll need to decompress everything safely until you actually try decompressing it.
(Many compression formats include a header that tells you how much space you’ll need to extract the data, but you can’t rely on that information because it could have been understated deliberately by an attacker to confuse your code and crash your decompressor on purpose.)
Astonishingly, if not actually amusingly, the fact that the bug was first investigated in 2018 means that the official bug number for this vulnerability is CVE-2018-25032, even though it was only assigned this week.
What to do?
- If you’re a user or a sysadmin, update to Zlib 1.2.12. Most Unix and Linux distros should provide this update for you. Although this bug is difficult to trigger, there are many places, notably on servers, where untrusted data that was supplied by an outsider gets compressed automatically and then archived, logged or transmitted by the system.
- If you’re a programmer, never assume that the “last bug” must have been found by now. Pay special attention to esoteric or unusual paths through your code that rarely get activated in real life – you may find bugs that time alone was not enough to expose by chance. If your code supports options that are not only rarely used but also not strictly needed, consider removing those parts altogether. You can’t trigger a bug in code that isn’t there.
- If you’re a product manager, never assume, just because a patch was produced, that it must have made it into the shipping code. Make sure that your software doesn’t have bugs that have been patched yet never actually fixed!
The price of freedom from bugs, as Mr Jefferson might have said, is eternal vigilance.