Kategoriler
News

2020 census data may not be as anonymous as expected

It’s census time in the US once again and multiple ads are running on both online and offline media to encourage everyone selected for the census to respond. The constitutional goal of the census is to ensure that congressional districts are properly balanced, thereby ensuring adequate representation. The modern census has expanded beyond that simple goal to allow for how Federal resources will be allocated within each district. Since the census collects significant sensitive data on citizens, it is by definition intended to be anonymous.

Having been selected to fill out a census, I decided to do so online, only second-guessing that decision when my browser informed me that the official website was using third-party cookies. For those who aren’t aware, third-party cookies are commonly used by ad networks to target their ads. The census is run by the US government, so why would there be any reason to place ads within a census form?

Given that I speak regularly on matters of cybersecurity and digital privacy, this seemed like an opportune time to dig into why something intended to be anonymous was in reality sharing information with third parties. After all, with most websites seeking consent to place cookies on your browser, it probably makes sense to understand what each category really means and what the real risks are.

As you can see from the screen shot below, my blocker found a few pieces of third-party content that I was interested to look into.

VB TRansform 2020: The AI event for business leaders. San Francisco July 15 - 16

Since modern websites aren’t built from scratch, my first order of business was to determine the core underlying platform. This is important, as that platform likely has a number of tracking features to ensure proper operation or performance. Those same tracking features are going to have representation in the list of third-party cookies but are also mostly harmless. In the case of the 2020 census, the web platform turns out to be the Adobe Experience Platform, which in turn explains any references to adobe.com and demdex.net. These cookies, along with first party cookies, can be classified as functional in nature.

The next most common cookie classification are those cookies relating to the overall performance of the website. These seek to track what the user experience looks like, and as someone who works in this space, I know just how hard it is to balance the user impact of such tools against the desired performance data. In the case of the 2020 census, they are using a service called Boomerang, which is offered by Akamai and references a website at g-mpulse.net. The website itself is hosted within AWS, which in turn allows for additional first party performance metrics to be gathered — all of which helps ensure proper uptime.

With these functional and performance cookies addressed, we can safely turn our attention to those from doubleckick.net, tapad.com, addthis.com, skimresources.com, ads-twitter.com, and facebook.net, among others. The big question we need to ask is, why would anyone running a website like that for the census intentionally want to track visitors using ad networks? To answer this question, we first need to determine if this was simply the result of including the Adobe framework (i.e., does a framework nominally used for commercial websites simply assume ad networks are required?).

A good starting point for this analysis is to inspect the underlying page representation in the browser. This is very different from looking at the page source, as modern dynamic websites often run client-side scripts to determine the final appearance of the web page. You can see the underlying web layout for the 2020 census home page below, and rather than immediately answering questions, it introduces some new ones — specifically, what is the relationship with Bing and Snapchat?

If we remain focused on the original question of tracking cookies, there is an obvious file to investigate: federated-analytics-min.js. This file does indeed represent tracking capabilities, but based on its version info and the lack of Google Analytics, this might be legacy code.

Continuing with the analysis, we eventually arrive at launch-03ad6712691b.min.js, which is hosted by Adobe. This file represents the configuration for the Adobe Experience framework of the 2020 census. After de-obfuscating the file and extracting the configuration json, we see evidence that the file was generated on March 12, 2020 for production use. We can also now explain the Bing, Twitter, and Snapchat code and cookies, which are conditionally present on all pages except for “jobs.”

This evidence points to the presence of ad network cookies being the result of intentional configuration of the 2020 census site. Considering this, we need to ask why the site authors would include tracking cookies on a website intended to anonymously collect data.

One possible answer is that the core requirements didn’t include anonymity for visitors as a functional requirement. Another possible answer is that the authors were accustomed to tracking all visitors by default and a detailed code review wasn’t made by those writing the specifications. One last possible explanation is that, in their attempts to ensure the largest response rate possible, the authors are in effect attempting to target census ads to only those who haven’t completed their census. Considering the physical mail for the census includes the message “YOUR RESPONSE IS REQUIRED BY LAW,” perhaps the authors took ensuring compliance as paramount, allowing that priority to override any potential privacy concerns.

These three scenarios highlight the challenges organizations face when designing for privacy. If we assume the tracking cookies are intentionally placed to ensure maximum response rates, one unintended outcome is that the respective ad networks and social media platforms will then be able to know roughly who responded. When combined with other data sources, these third parties might then be able to identify individual respondents. Such data mining is at the core of how ad networks function and how the usage of personal information for unknown purposes is at the core of regulations like GDPR in Europe and the recently enacted California Consumer Privacy Act.

These laws, and many others, were created based on the realization that when companies have access to data, they will find novel ways to use it independent of the original reasons it was collected. This is why we see a proliferation of cookie consent messages on websites and why designing for privacy requires placing the rights and expectations of the consumers front and center.

While we might rightly be concerned about the federal government tracking our census activity, there are two simple ways to reduce the risk — fill out the old school paper option or go full incognito in your browser before even going to the census website.

Tim Mackey is principal strategist at the Synopsys Cyber Security Research Center.