It’s been nearly three years since the beginning of Edward Snowden’s historic leaks. A growing number of journalists across a constellation of American news organizations are taking up security trainings to arm themselves against surveillance. One of the most popular tools for securing communications is Pretty Good Privacy (PGP) email encryption. Anecdotally, folks in Western journalism and security communities assume newsrooms are adopting PGP. I wanted to see if there’s any truth to that, so I gathered data on the 10 most highly ranked news sites in the United States to learn when people began using PGP. Perhaps more importantly, as I investigated, I realized that the data are fundamentally unreliable. So let’s talk about how to get public PGP data, and why we should probably be skeptical about it.
Where does one find data on PGP adoption?
I used MIT’s PGP keyserver to collect data related to news organizations. I looked for public keys associated with 10 news organizations, and their signup dates. I chose the 10 most highly trafficked news sites on Alexa “Top Sites”, a tool that tracks unique monthly visits by domain, and attempted to find domains that the news orgs have used for email. I would have different results if I used different organizations, so if you’re interested in trying it out with other orgs, you can see how I did it or use my code here.
I’m interested in registration dates for unique public key IDs associated with news orgs. I also gathered names and emails in order to identify and manually remove users that appeared more than once. I only included the very first appearance of each user. I also threw out a few that were made for testing purposes (e.g., “email@example.com”), those that have clearly fake domains (e.g., “fake.whitehouse.gov”) and those that appeared to be bots (e.g., “firstname.lastname@example.org”). In practice, this is just a lot of eyeballing and manual deletion. After some old-fashioned data cleaning by hand and correctly formatting the data, we’re ready to plot it to check out interesting trends. You can check out my resulting dataset and code.
In the interest of privacy and simplicity, I chose not to report the keys, emails, or peoples’ names in my dataset. Of course, anyone can find them by looking at the keyserver.
10 News Orgs and PGP key registration
Here’s what our data look like. Some of these news organizations are fairly young, so journalists representing each organization have unequal time to try out PGP. Many organizations saw an uptick in key registrations quite recently, and it looks like the years since the Snowden disclosures seen more activity. For example, before 2013, on average we see that most years get roughly 11 new registrations. In 2013, that number jumped to 65. In 2014, it held steady at 63 new registrations. In 2015, registrations jumped again — 127 new registrations. As of March 2016, we’ve seen 40 new registrations. If the momentum continues, 2016 could see more key registrations than ever.
Despite being a fairly new organization founded in 2006, BuzzFeed appears to have launched ahead in PGP key registrations, including 121 in total, dwarfing all other organizations but the New York Times with 200. With that in mind, the New York Times’s raw employee numbers likely drive up the number of their PGP users.
You may remember CNET in the 90s and early 2000s for technology news and reviews. Apparently they were also big PGP users before it was cool. Given their brand of tech journalism, this makes sense a lot of sense. They appear to have curbed adoption after the mid-2000s.
The rest of our lineup — a mix of digital native and traditional news publications — tend not to have tried out PGP until much later, and in smaller numbers. The Washington Post, one of the organizations with early access to the Snowden docs, appears to have registered many of its keys in the days immediately following the first publications on PRISM and Verizon’s involvement mass surveillance of Americans’ phone metadata.
Also interesting: Looking at the keyserver data, some days have a lot more traffic than others. For example, you can tell someone had a security training or a cryptoparty at BuzzFeed on August 4, 2015, when 9 keys get registered on the same day. You can find many possible dates of cryptoparties at news orgs in this way.
Now, there are a lot of fascinating insights that can bubble to the surface by exploring PGP keyserver data, if we trust it. The problem is that it’s not clear if we should.
The unreliability of PGP keyserver data
Here’s why you might want to be skeptical about the accuracy and representativeness of PGP keyserver data.
- We might not capture all the people we are interested in. If someone uses a different email address for PGP (e.g., a personal Gmail) instead of their professional account, we didn’t capture them here. Likewise, not everyone posts their key to the keyserver, and there’s no way to know how many people are choosing *not* to post their keys. Still, I think this is likely to be less of an issue for an analysis of news organizations, just because journalists want to be accessible to potential sources.
- Because people can use multiple emails with the same public key, it’s sometimes difficult to be sure there is a one-to-one relationship between a key and a user. For example, I found a small number of entries that had two different reporters assigned to one public key. (Luckily for me, they resolved themselves while I was cleaning my data, but it could complicate analysis in the future.)
- We have no way of knowing when any particular journalist started using PGP. Maybe it was before they started using it at the news organization captured in our data.
- Importantly, it’s trivially easy to forge the date when a key was created. In late 2015, Motherboard demonstrated that anyone can forge registration dates, and it’s unclear how often this happens in practice.
- Here’s the kicker: Without asking each person individually, we have no way to guarantee that the journalist actually posted their own public key. What do I mean by that?
Let’s say that you want to register a key for an email. Using GPG Tools, you register an email pretty painlessly. In fact, you don’t even need to own the email. For example, if we want to register “email@example.com” we can do that pretty quickly.
(Coincidentally, imaginarynews.com is an actual webpage featuring imaginary news.)
There is no shortage of holes in verifying the users’ identities on a platform designed for finding and verifying users’ identities. So researchers, journalists — anyone who wants to conduct similar work — the data are fascinating. Still, we should be skeptical of PGP keyserver data.