On Contact Tracing and Privacy

Sun, 2020 Aug 9

Lately, many companies and governments have been creating and encouraging the use of "contact tracing" smartphone apps, that can alert you if you come into contact with someone infected with COVID-19.

These apps all claim to respect your privacy, by virtue of not tracking your location and/or reporting anything back to a central server. Unfortunately, this is is only half true. While these apps (as far as I know) do indeed not monitor your location, or report back to someone, they do not, and by definition cannot, truly respect your privacy.

How They Work

All of these apps that I know of operate on the same basic principle:

The app generates a unique, random ID.
The app continuously broadcasts this ID via Bluetooth.
The app records which other unique IDs it sees broadcast.
If one of these IDs matches a list of known infected (or the user has flagged themselves as infected? unusre if any support this), the app alerts you.
Some apps might also allow you to submit a list of the IDs you encountered to a public health authority for analysis of the virus' spread. (Unsure if any actually implement this.)

These apps do not (according to their developers):

Access, record, or monitor your location, audio, video, etc.
Report anything back to a central server without you explicitly telling them to.
Transmit anything besides a unique ID.

So, no problem, right? Well...

Problem #1: Unique IDs

Every WiFi and Bluetooth device has a MAC Address - a unique ID, included in every packet so that the receiver knows who to respond to. Your phone is constantly transmitting packets containing these IDs in order to try to connect to your WiFi network (or to communicate with it, if already connected), find/communicate with your Bluetooth devices, etc.

In 2014, Apple realized that these unique IDs posted a privacy concern, and began randomizing them. Others followed suit, and now all major operating systems implement this randomization.

The methodology is simple: set up a couple receivers, and listen for these unique IDs. By correlating who's nearby with what IDs are being broadcast, you can easily determine which ID belongs to which person (or at least, which device). This information can be exploited in all sorts of ways:

Monitoring for known IDs to know when someone is nearby
Keeping tabs on where someone goes and when
Stalking someone by following their signal
Build a profile of someone's behaiour patterns, shopping habits, etc by monitoring where they spend their time

Today, most devices choose a random MAC address, and change it every so often, to foil this sort of tracking. But, constantly broadcasting a unique ID to everyone around you is exactly how these contact tracing apps work. Though they have good intentions, they reopen a previously closed vector for abuse.

Problem #2: Trust

This, really, is a problem that all apps face, but the nature of these makes them an especially juicy target. Simply put, when you use an app, you need to trust:

That its developers haven't put anything malicious into it
That they won't add something malicous in the future (especially with automatic updates)
That they won't sell out to someone who will add something malicious
That they, and the place you downloaded the app from, won't get hacked and have malicious code snuck in
That the app itself can't be tricked by a third party into behaving maliciously

You may (or may not) trust your local government to make an app that doesn't do anything nasty on purpose, but do you trust them to make one that can't be hacked? To store the app and its source code on a secure system? To not be tricked into adding some "helpful" feature by someone with ulterior motives?

So, how do we fix it?

Well, I'm not entirely sure.

What immediately comes to mind is that the apps shouldn't transmit a unique ID. They could simply not transmit at all if the user isn't infected, or transmit some type of digital certificate of test results. But these approaches have problems.

The main benefit of these apps isn't just to act as a COVID-19 radar. They keep a log of who you've been near (which IDs they've seen recently), and can alert you after the fact if one of the people in that log is found to be infected. Without the unique IDs, this can't work.

A partial solution could be public-key cryptography. Pack unique ID + random nonsense into a message, encrypt it, and broadcast that. The receiver's device can't tell which messages belong to which ID. They can only report which ones they've seen back to the server, which then decrypts the messages and informs the user if any belong to infected people.

Of course, that solution has its drawbacks: it requires you to check in (surveillance!!11!), increases the amount of storage required (every received broadcast must be stored, even though many will be redundant), and if not done correctly, still might not actually solve the problem. (Cryptography is very difficult to get right, but very easy to get subtly wrong, such that it seems to work fine, but has a fatal flaw.)

Perhaps the best solution is to just be aware that these apps can't truly respect your privacy while serving their purpose, know the risk they pose, and decide whether you're willing to take that risk.