Pick a random person, and ask them these two questions:
Q1. Have you heard of Apache?
Q2. If so, can you name an Apache product?
We’re willing to wager that you will get one of two replies:
A1. No. A2. (Not applicable.)
A1. Yes. A2. Log4j.
Two weeks ago, however, we’d suggest that very few people had heard of Log4j, and even amongst those cognoscenti, few would have been particularly interested in it.
Until a cluster of potentially catastrophic bugs – originally implemented as features, on the grounds that less is never more – were revealed under the bug-brand Log4Shell, the Log4j programming library was merely one of those many components that got sucked into and used by thousands, perhaps even hundreds of thousands, of Java applications and utilities.
Log4j was just “part of the supply chain” that came bundled into more back-end servers and cloud-based services than anyone actually realised until now.
Many sysdamins, IT staff and cybersecurity teams have spent the past two weeks eradicating this programmatic plague from their demesnes. (Yes, that’s a real word. It’s pronounced domains, but the archaic spelling avoids implying a Windows network.)
Don’t forget “the other Apache”
Rewind to the oh-so-recent pre-Log4j era and we suggest that you’d get a different pair of answers, namely:
A1. Yes. A2. Apache’s a web server, isn’t it? (Actually, it’s a software foundation that makes a web server, amongst much else.)
A1. Yes. A2. Apache makes httpd
, probably still the world’s most prevalent web server.
With more than 3000 files totalling close to a million line of source code, Apache httpd
is a large and capable server, with myriad combinations of modules and options making it both powerful and dangerous at the time.
Fortunately, the open source httpd
product receives constant attention from its developers, getting regular updates that bring new features along with critical security patches.
So, in all the excitement about Apache Log4j, don’t forget that:
- You almost certainly have Apache
httpd
in your network somewhere. Just like Log4j,httpd
has a habit of getting itself quietly included into software projects, for example as part of an internal service that works so well that it rarely draws attention to itself, or as a component built unobtrusively into a product or service you sell that isn’t predominantly thought of as “containing a web server”. - Apache just published an
httpd
update that fixes two CVE-numbered security bugs. These bugs might not be exposed in your configuration, because they are part of optional run-time modules that you might not actually be using. But if you are using these modules, whether you realise it or not, you could be at risk of server crashes, data leakage, or even remote code execution.
What got fixed?
The two CVE-numbered flaws are listed in Apache’s own changelog as follows:
- CVE-2021-44790: Possible buffer overflow when parsing multipart content in mod_lua of Apache HTTP Server 2.4.51
- CVE-2021-44224: Possible NULL dereference or SSRF in forward proxy configurations in Apache HTTP Server 2.4.51 and earlier.
The good news about the first bug is that Apache itself warns that the mod_lua
server extension (which allows you to adapt the behaviour of httpd
using Lua scripts instead of having to write modules in C):
…holds a great deal of power over
httpd
, which is both a strength and a potential security risk. It is not recommended that you use this module on a server that is shared with users you do not trust, as it can be abused to change the internal workings of httpd.
However, as Log4j has taught us, potentially exploitable bugs even on non-public servers can be troublesome if those bugs can be triggered by untrusted user data passed along by other internet-facing servers at your network edge.
And CVE-2021-44790 doesn’t involve sneaking any untrusted add-on Lua scripts into the configuration.
Instead, it involves simply tricking the “preprocessor” that prepares untrusted user content to be passed to trusted Lua scripts, so the attack does not depend on bugs or flaws in any of the add-on scripts you may have written yourself.
Multipart message splitting
Simply put, the CVE-2021-44790 bug exists in the code that deconstructs multipart messages, common in web form uploads, that typically look something like this:
Content-Type: multipart/form-data; boundary=VILC2R2IHFHLZZ --VILC2R2IHFHLZZ Content-Disposition: form-data; name="name" <--blank line denotes start of first data item Paul Ducklin --VILC2R2IHFHLZZ <--double-dash-plus-boundary denotes end Content-Disposition: form-data; name="phone" <--blank line denotes start of second data item 555-555-5555 --VILC2R2IHFHLZZ-- <--double-dash-plus-boundary denotes end
Technically, each multipart component consists of the data after the end of each fully blank line (see above), and before each boundary line, which consists of two dashes (hyphens) followed by the unique boundary marker text.
In case you are wondering, the extra double-dash at the end of the very last line above signals the final item in the list.
A blank line in the raw data appears as two CRLF
(carriage return plus line feed) pairs, or the ASCII codes (13,10,13,10), denoted in C by the text string "\r\n\r\n"
.
This parsing is handled very crudely by code that we’ve simplified like this:
for (start = findnext(start,boundarytext); start != NULL; start = end) { crlf = findnext(start,"\r\n\r\n"); if (!crlf) break; end = findnext(crlf,boundarytext); len = end - crlf - 8; buff = memalloc(len+1); memcpy(buff,crlf+4,len); [. . .] }
Don’t worry if you don’t know C – this code is impenetrable and poorly-documented enough even if you do. (The original is much more complex and harder to follow; we have stripped it to its basics here.)
Loosely speaking, it looks for a double-CRLF
string, denoting the next blank line; from there, it finds the next occurrence of the boundary marker text (VILC2R2IHFHLZZ
in our example above).
It then assumes that the data it needs to extract consists of everything between those two marker points, denoted by the memory addresses (pointers in C jargon) crlf
and end
, minus 8 bytes.
The code makes no effort to explain the meaning of that “minus 8” in the code, nor yet the “plus 4” two lines later, though it’s a good immediate guess that crlf+4
is there to skip past the 4 bytes that make up the data in the CRLFCRLF
string itself. (The blank line is a separator, and isn’t part of the data to be used.)
Here’s where the “8” comes from:
- 4 bytes taken up by the
CRLFCRLF
characters at the start, which are not part of the data itself. - 2 bytes of the
CRLF
at the end of the last line of data, not included. - 2 bytes used by the dashes
(--)
that denote the start of the boundary line, not included.
As you can see, the code allocates enough memory for the data between the exact start of the line after the CRLFCRLF
separator and the exact end of the line before the boundary marker…
…plus an extra 1 byte (len+1
) to ensure a NUL
character (a zero byte) at the end of the buffer to act as the terminator that text strings require in C.
The code then uses memcpy()
to copy the relevant data out of the incoming message into that new memory buffer, in which it will be presented to the Lua script that is about to run.
What if there aren’t 8 bytes?
You’ve probably figured out the problem: What if there aren’t 8 extra bytes to remove? What if the CRLF
at the end of the last line of data, or the --
at the start of the next line, aren’t there at all?
What if there aren’t 8 bytes altogether between the CRLFCRLF
and the boundary text?
This bug would have been much more obvious if the code were more clearly constructed or commented, and would almost certainly have been avoided if the CRLF--
separator between the blank line and the boundary text had been referred to explicitly by the programmer, and tested for explicitly.
That bug was patched by adding a check to make sure that the final buffer size calculation doesn’t come out too small, by adding one line before the memory allocation attempt:
if (end - crlf <= 8) break;
This tests that the buffer length can’t end up negative, though we still think that an explicit check for a correct data terminator, in the same way that there’s an explicit check for CRLFCRLF
, would make for clearer code, and we’d insert a comment referring the reader to a helpful internet RFC about multipart messages, e.g. RFC 2045.
Proxy problems
Dealing with CVE-2021-44224 involved numerous code changes, the most obvious being a correction in a file of utility code used by the httpd
proxy module.
The fact that there are more than 5000 lines of C in proxy_util.c
alone, which is support code for just one of many httpd
modules, is testament to the overall size and complexity of the Apache HTTP Server.
The code we’re referring to above was changed from this…
url = ap_proxy_de_socketfy(p, url);
…to code that verifies that the function called actually did find a URL string to work with:
url = ap_proxy_de_socketfy(p, url); if (!url) { return NULL; }
Before the “if no url” error-check forced the code to give up early, the program would plough on even if url
were NULL
, and try to access the memory via the url variable.
Reading from or writing to a NULL
pointer is “undefined” according to the C standard, which means you must take care never do either.
Indeed, on almost all modern operating systems, the value used for NULL
, usually zero, is chosen so that any attempt to access that address, whether by reading or writing, will not only fail but be trapped by the operating system, which will then typically kill off the offending process to prevent dangerous or unintended side effects.
What to do?
- If you are using Apache httpd anywhere, update to 2.4.52 as soon as you can.
- If you can’t patch, check whether your configuration is at risk. There are many bug fixes beyond these two CVEs, so you should patch sooner rather than later. But you may decided to defer patching until a more convenient time if you aren’t loading either the Lua scripting or the proxy module.
- If you are a coder, don’t forget to check for errors. If there’s a chance to spot mistakes before you make them worse, for example by verifying you have really have enough memory to play with, or checking that the string you are looking for really is there, take it!
- If you are coder, assume that someone else will need to understand your code in the future. Write helpful and useful comments, on the grounds that those who cannot remember the past are condemned to repeat it.