A bit of a brouhaha erupted at the end of last week – it wasn’t quite an argument between Twitter and Firefox, but it did get confusing pretty quickly.
The issue had to do with how long your browser might hang on to local copies of private data such as direct messages, even after they’d actually been posted.
Twitter published an blog article tagged “Privacy” that stated:
We recently learned that the way Mozilla Firefox stores cached data may have resulted in non-public information being inadvertently stored in the browser’s cache. This means that if you accessed Twitter from a shared or public computer via Mozilla Firefox and took actions like downloading your Twitter data archive or sending or receiving media via Direct Message, this information may have been stored in the browser’s cache even after you logged out of Twitter.
We’re guessing that this problem was submitted to Twitter as a bug report, presumably by someone who just happened to look through their Firefox cache files and was surprised to see what showed up there.
In computer science jargon, the word cache means rather the opposite of what it means in the military or to pirates.
In the piratical context, a cache might a secret store of important supplies that the pirates could get at if they were stranded or encountered an emergency, such as gold coins and weapons buried in an unlikely location in case armed attackers took them unawares.
In computing, a cache is a place to store items you expect to need again soon in a place that’s easy and quick to access – precisely so that you don’t need to go and find your hiding place and dig out the data from last time you stashed it.
For example CPUs (processors) have a small amount of super-high-speed internal data storage to cache the values in memory (RAM) that you’ve used most recently, because CPUs are faster than RAM chips; disk drives have RAM chips to cache recently used disk sectors, because RAM is faster than disk, even if it’s solid state flash storage…
…and browsers have a cache of local disk files to hold onto recently used web content, because reading from disk is generally much faster than going out across the internet all over again.
So you would expect your browser’s cache to contain plenty of hints about what you’ve been up to lately, but not everything you’ve seen or sent.
What gets cached?
We started Firefox with a totally empty cache, browsed to twitter.com, and then grabbed a copy of the files Firefox had chosen to keep for later in its cache directory. (We used Linux, where the files can generally be found in the directory /home/[yourname]/.cache/mozilla/firefox/[uniqueID]-default/cache2/entries
)
After loading just one page, we already had 42 web items cached, including images, JavaScript files, web certificate checks and Twitter data.
A sample of filename and types is shown here, where the filenames are computer-generated 20-byte numbers turned into hexadecimal characters:
782EEDFE807FB787884BC52467159DF43C35D8DF: image/jpeg 91CAAE3C18091A616AB712DEDE722A8DFB081C97: image/jpeg [. . 13 more images . .] 7E5141AFAB454D5000F81591D6DE2FBE2BC278BF: text/javascript 7FF939CD17A49227D6AA500DC3AF0AEC0216A117: text/javascript [. . 19 more JavaScript files . .] F7F250DC5A7B70C65829D50FA26D6FF48336584E: text/json 991BE611B1027BD55542203536CCB91E8F5ACF60: binary/certificate-status BB9B09D4FC25316CB73AC20F2A9622383F9402BA: binary/certificate-status [. . 2 more certificate checks . .] E715BD4129016B499FBD4666D453A665EBD3EBBC: binary/twitter-data
The JSON file looks like a list of more than 1700 Twitter ad campaigns, judging by the first few entries:
campaignName:"DisneyBlackWidow_Emoji_YelenaBelova", hashtag:"" campaignName:"100Thieves_2020", hashtag:"100T" campaignName:"100Thieves_2020", hashtag:"100Thieves", campaignName:"100Thieves_2020", hashtag:"100WIN" campaignName:"AoyamaOfficial_2020_Spring", hashtag:"17文字のありがとう" [. . 1727 more items . .]
Sometimes, browsers know implicitly not to cache pages – for example, because they are one-offs and therefore won’t be needed again – but the only way that they can know explicitly not to cache content is if the website serving up the data says so.
A server can inform a browser what sort of caching it may do by including the HTTP header Cache-Control
in its reply along with the data, for example like this:
$ curl -I https://nakedsecurity.sophos.com HTTP/2 200 date: Mon, 06 Apr 2020 23:18:21 GMT content-type: text/html; charset=UTF-8 content-length: 59419 cache-control: max-age=300, must-revalidate strict-transport-security: max-age=31536000
Here, nakedsecurity.sophos.com is telling the program downloading our main page that for the next five minutes (max-age=300 seconds), it can re-use the content of this reply before checking back to see if it’s changed (must-revalidate).
But if a reply should not be cached, either because to do so would be waste of time or a needless risk to privacy, the server should state:
cache-control: no-store
What went wrong with Twitter?
Why did Firefox cache data that Twitter surely didn’t want it to, as stated by Twitter in the blog post quoted above?
The answer is a curious one – according to Firefox, Twitter forgot to send the no-store
instruction to forbid caching explicitly:
In this case, Twitter did not include a ‘no-store’ directive for direct messages. The content of direct messages is sensitive and so should not have been stored in the browser cache.
Apparently, having set an old-fashioned header to say Pragma: no-cache
, which doesn’t quite mean the same thing, Twitter observed that other browsers didn’t save the replies anyway.
From this, Twitter seems to have inferred – understandably if incorrectly – that it had done enough to prevent any web client from holding onto data such as private messages.
This inference was not, however, correct for Firefox (and, we assume, derived products such as Tor Browser).
Inquisitive users might indeed trip over old copies of private messages in the cache that they’d reasonably have assumed wouldn’t be there.
As far as we can tell, the issue has been sorted out amicably, with Twitter now unambiguously telling Firefox to no-store
the offending data, and Firefox accordingly not storing it.
What to do?
There’s not a big risk here, unless perhaps you share your computer account and your browser with someone else to manage two different Twitter accounts.
But if you’re sharing a computer account and your browser with someone else, what’s left in your Firefox cache by mistake is probably the least of your worries.
We recommend creating separate accounts with different passwords to keep your digital lives apart – not because you don’t trust the other person but simply to prevent accidents and misunderstandings.
There’s also a tiny risk that if you were infected by malware or hacked after using Twitter, a crook might, just might, be able to recover data from your Firefox cache that Twitter ought to have suppressed from being there.
But any data left in your browser cache increases the risk of malware figuring out what you’ve been up to do even after you did it, so regularly clearing the data your browser keeps about you is a great idea.
If you don’t mind logging back into all your websites every time you exit and reload Firefox (it’s a small hassle for a big security reward), we recommend telling Firefox to clean up every time you exit.
Go to Preferences > Privacy & Security > History > Clear history when Firefox closes > [Settings…] to decide what to clear every time:
Lastly, if you’re a web developer, and you are serving up data you know you don’t want the recipient to hang onto, don’t leave them to guess – use the Cache-Control: no-store
header and make your requirements crystal clear.