Java programmers love string interpolation features.
If you’re not a coder, you’re probably confused by the word “interpolation” here, because it’s been borrowed as programming jargon where it’s not a very good linguistic fit…
…but the idea is simple, very powerful, and sometimes spectacularly dangerous.
In other programming ecosystems it’s often known simply as string substitution, where string is shorthand for a bunch of characters, usually meant for displaying or printing out, and substitution means exactly what it says.
For example, in the Bash command shell, if you run the command:
$ echo USER
…you will get the output:
USER
But if you write:
$ echo ${USER}
…you will get something like this instead:
duck
…because the magic character sequence ${USER}
means to look in the environment (a memory-based collection of data values typically storing the computer name, current username, TEMP directory, command path and so on), retrieve the value of the variable USER
(by convention, the current user’s login name), and use that instead.
Similarly, the command:
echo cat /etc/passwd
…prints out exactly what’s on the command line, thus producing:
cat /etc/passwd
…while the very similar-looking command:
$ echo $(cat /etc/passwd)
…contains a magic $(...)
sequence, with round brackets instead of squiggly ones, which means to execute the text inside the brackets as a system command, collect up the output, and write that out as a continous chunk of text instead.
In this case, you’ll get back a slightly garbled dump of the username file (despite the name, no password data is stored in /etc/passwd
any more), something like this:
root:x:0:0::/root:/bin/bash bin:x:1:1:bin:/bin:/bin/false daemon:x:2:2:daemon: daemon:x:2:2:daemon:/sbin:/bin/false adm:x:3:4:adm:/var/log:/bin/false lp:x:4: 7:lp:/var/spool/lpd:/bin/false [...TRUNCATED...]
The risks of untrusted input
As you can imagine, allowing untrusted input, such as data submitted in a web form or content extracted from an email, to be processed by a part of your program that performs substitution or interpolation can be a cybersecurity nightmare.
If you aren’t careful, simply preparing a text message to be printed out to a logfile could trigger a whole load of unwanted side-effects in your app.
These could include, at increasing levels of danger:
- Accidentally leaking data that was only ever supposed to be in memory. Any string interpolation that extracts data from environment variables and then writes it to disk without permission could put you in trouble with your local data security regulators. In the Log4Shell incident, for example, attackers made a habit of trying to access environment variables such as
AWS_ACCESS_KEY_ID
, which contain cryptographic secrets that aren’t supposed to get logged or sent anywhere except to specific servers as a proof of authentication. - Triggering internet connections to external servers and services. Even if all an attacker can do is to trick you into looking up the IP number of a servername using DNS, you’ve nevertheless just been coerced into “calling home” to a DNS server that the attacker controls, thus potentially leaking information about the internal structure of your network
- Executing arbitrary system commands picked by someone outside your network. If the string interpolation lets attackers trick your server into running a command of their choice, then you have created an RCE hole, short for remote code execution, which typically means the attackers can exfiltrate data, implant malware or otherwise mess wtith the cybersecurity configuration on your server at will.
As you no doubt remember from Log4Shell, unnecessary “features” in an Apache programming library called Log4J (Logging For Java) suddenly made all these scenarios possible on any server where an unpatched version of Log4J was installed.
If you can’t read the text clearly here, try using Full Screen mode, or watch directly on YouTube. Click on the cog in the video player to speed up playback or to turn on subtitles.
Not just internet-facing servers
Worse, problems such as the Log4shell bug aren’t neatly confined only to servers that are directly at your network edge, such as your web servers.
When Log4Shell hit, the initial reaction from lots of organisations was to say, “We don’t have any Java-based web servers, because we only use Java in our internal business logic, so we think we’re immune to this one.”
But any server to which user data was ultimately forwarded for processing – even secure servers that were off-limits to connections from outsiders – could be affected if that server [A] had an unpatched version of Log4J installed, and [B] kept logs of data that oroiginated from outside.
A user who pretended their name was ${env:USER}
, for example, would typically get logged by the Log4J code under the name of the server account doing the processing, if the app didn’t take the precaution of checking for dangerous characters in the input data first.
Sadly, history repeated itself in July 2022, when an open source Java toolkit called Apache Commons Configurator turned out to have similar string interpolation dangers:
Third time unlucky
And history is repeating itself again in October 2022, with a third Java source code library called Apache Commons Text picking up a CVE for reckless string interpolation behaviour.
This time, the bug is denoted as follows:
CVE-2022-42889: Apache Commons Text prior to 1.10.0 allows RCE when applied to untrusted input due to insecure interpolation defaults.
Commons Text is a general-purpose text manipulation toolkit, described simply as “a library focused on algorithms working on strings”.
Even if you are a programmer who hasn’t knowingly chosen to use it yourself, you may have inherited it as a dependency – part of the software supply chain – from other components you are using.
And even if you don’t code in Java, or aren’t a programmer at all, you may have one or more applications on your own computer, or installed on your backend business servers, that include compoents written in Java.
What went wrong?
The Commons Text toolkit includes a handy Java component known as a StringSubstitutor
object, created with a Java command like this:
StringSubstitutor interp = StringSubstitutor.createInterpolator();
Once you’ve created an interpolator, you can use it to rewrite input data in handy ways, such as like this:
String str = "You have-> ${java:version}"; String rep = interp.replace(str); Example output: You have-> Java version 19 String str = "You are-> ${env:USER}"; String rep = interp.replace(str); Example output: You are-> duck
The replace()
function processes its input string as if it’s a kind of simple software program in its own right, copying the characters one-by-one except for a variety of special embedded ${...}
commands that are very similar to the ones used in Log4J.
Examples from the documentation (derived directly from the source code file StringSubstitutor.java
) include:
Programming function Example -------------------- ---------------------------------- Base64 Decoder: ${base64Decoder:SGVsbG9Xb3JsZCE=} Base64 Encoder: ${base64Encoder:HelloWorld!} Java Constant: ${const:java.awt.event.KeyEvent.VK_ESCAPE} Date: ${date:yyyy-MM-dd} DNS: ${dns:address|apache.org} Environment Variable: ${env:USERNAME} File Content: ${file:UTF-8:src/test/resources/document.properties} Java: ${java:version} Script: ${script:javascript:3 + 4} URL Content (HTTP): ${url:UTF-8:http://www.apache.org} URL Content (HTTPS): ${url:UTF-8:https://www.apache.org}
The dns
, script
and url
functions are particularly dangerous, because they could lead to untrusted data, received from outside your network but processed or logged on one of the business logic servers inside your network, doing the following:
dns: Lookup a server name and replace the ${...} string with the given value returned. If attackers use a domain name they themselves own and control, then this lookup will terminated at a DNS server of their choosing. (The owner of a domain name is, in fact, obliged to provide whats known as definititive DNS data for that domain.) url: Lookup a server name, connect to it using HTTP or HTTPS, and use what's send back instead of the string ${...}. The danger posed by this behaviour depends on what the replacement string is used for. script: Run a command of the attacker's choosing. We were only able to get this function to work with older versions of Java, because there's no longer a JavaScript engine built into Java itself. But many companies and apps still use old-but-still-supported Java versions such as 1.8 (JDK 8) and 11.0 (JDK 11), on which the dangerous ${script:javascript:...} remote code execution interpolarion trick works just fine. ----- String str = "DNS lookup-> ${dns:address|nakedsecurity.sophos.com}"; String rep = interp.replace(str); Output: DNS lookup-> 192.0.66.227 ----- String str = "Stuff sucked frob web-> ---BEGIN---${url:UTF8:https://example.com}---END---" String rep = interp.replace(str); Output: Stuff sucked frob web-> ---BEGIN---<!doctype html> <html> <head> <title>Example Domain</title> . . . </head> <body> <div> <h1>Example Domain</h1> [. . .] </div> </body> </html>---END--- ----- String str = "Run some code-> ${script:javascript:6*7}" String rep = interp.replace(str); Output: Run some code-> 42
What to do?
- Update to Commons Text 1.10.0. In this version, the
dns
,url
andscript
functions have been turned off by default. You can enable them again if you want or need them, but they won’t work unless you explicity turn them on in your code. - Sanitise your inputs. Wherever you accept and process untrusted data, especially in Java code, where string interpolation is widely supported and offered as a “feature” in many third-party libraries, make sure you look for and filter out potentially dangerous character sequences from the input first, or take care not to pass that data into string interpolation functions.
- Search your network for Commons Text software that you didn’t know you had. Searching for files with names that match the pattern
common-text*.jar
(the*
means “anything can match here”) is a good start. The suffix.jar
is short for java archive, which is how Java libraries are delivered and installed; the prefixcommon-text
denotes the Apache Common Text software components, and the text in the middle covered by the so-called wildcard*
denotes the version number you’ve got. You wantcommon-text-1-10.0.jar
or later. - Track the latest news on this issue. Exploiting this bug on vulnerable servers doesn’t seem to be quite as easy as it was with Log4Shell. But we suspect, if attacks are found that cause trouble for specific Java applications, that the bad news of how to do so will travel fast. You can keep up-to-date by keeping your eye on this @sophosxops Twitter thread:
Sophos X-Ops is following reports of a new vulnerability affecting Apache CVE-2022-42889 affects versions 1.5-1.9, released between 2018-2022. https://t.co/niaeqL2Sr9 1/7
— Sophos X-Ops (@SophosXOps) October 17, 2022
Don’t forget that you may find multiple copies of the Common Text component on each computer you search, because many Java apps bring their own versions of libraries, and of Java itself, in order to keep precise control over what code they actually use.
That’s good for reliability, and avoids what’s known in Windows as DLL hell or dependency disaster, but not quite as good when it comes to updating, because you can’t simply update a single, centrally managed system file and thus patch the entire computer at once.