Science·Analysis

How one researcher harvested data from 50 million people — and Facebook was designed to help

Cambridge Analytica used the data to build detailed profiles on voters in the run-up to the 2016 U.S. presidential election and the U.K.'s Brexit campaign.

Cambridge Analytica used the data to build voter profiles for U.S. and U.K political ad campaigns

About 270,000 people installed a data-harvesting Facebook app, which was portrayed as part of an online personality quiz that participants were paid to take. (Benoit Tessier/Reuters)

In hindsight, it's strange that for years, anyone installing a Facebook app could not only give that app's developer access to their personal information, but the personal information of all their friends. Where your friends lived, worked, and went to school — not to mention their interests and the pages they had liked — were all fair game.

The feature was well-intentioned, of course. One app, Job Fusion, used the functionality to show users job openings at the places their friends worked. The video-sharing app Vine let users see which of their Facebook friends also used the app — until, Facebook cut off its competitor's access.

Even dating apps used such information to connect users with friends of their friends who had similar interests.

It wasn't until 2015 that the feature was finally removed. By then, it was clear that access to such a large trove of information could be abused, and privacy concerns about data leakage were rife. Going forward, developers could only access information about friends who had also consented to using their app, too.

But, notably, developers didn't have to delete the information they had already collected. This weekend, in a pair of reports by the Observer and the New York Times, we learned what one developer did with the data it retained.

Using an innocuous looking quiz and an accompanying Facebook app, a researcher named Aleksandr Kogan collected data from a staggering 50 million Facebook users on behalf of data analytics company Cambridge Analytica. The firm is reported to have used all that data to build detailed profiles on the personalities of American and British voters, for use by Republican political candidates during the 2016 U.S. presidential election and Brexit's Vote Leave Campaign.

And as outrageous as that may sound, Facebook was operating exactly as it was designed to at that time — a design that left millions of its users unwittingly exposed.

How the harvesting happened

According to both newspapers, about 270,000 people installed Kogan's Facebook app, which was portrayed as part of an online personality quiz that participants were paid to take. The Observer estimated that each user who installed Kogan's app granted him access to profile information from at least 160 of their friends. All told, he was able to harvest data on 50 million Facebook users in a matter of weeks.

Users were told the data being collected was for academic purposes, and both Facebook's terms of service and British data protection laws prohibited Kogan from selling or sharing the data with third parties without their consent. But the data was provided to Cambridge Analytica nonetheless.

Facebook says, 'Everyone involved gave their consent.' Yet millions of users did not — not explicitly — nor likely expected their data to be shared just because they were friends with someone who participated in a quiz. (Toby Melville/Reuters)

The firm "then used the test results and Facebook data to build an algorithm that could analyze individual Facebook profiles and determine personality traits linked to voting behaviour," according to the Observer's report. That algorithm could be used to more effectively target Facebook ads to the people their analysis determined would be most likely influenced by its message.

It may have been a violation of the agreement that Kogan made with Facebook — and a violation of what users reasonably expected from a simple personality quiz app. But that was all that separated the behaviour of Kogan's app from otherwise legitimate apps. It collected the same type of data that at least hundreds of other developers also had access to at that time.

Was Facebook breached?

The Observer and the New York Times characterize the data that was harvested and provided to Cambridge Analytica as a breach. Facebook, however, disputes this characterization.

In Facebook's view, Kogan "did not break into any systems, bypass any technical controls, or use a flaw in our software to gather more data than allowed," argued the company's chief security office Alex Stamos in a now-deleted series of posts on Twitter.

"Everyone involved gave their consent," echoed Paul Grewal, a Facebook lawyer and company executive, in a statement. "People knowingly provided their information."

Yet millions of users did not — not explicitly — nor likely expected their data to be shared simply because they were friends with someone who participated in a quiz.

It's also clear that Kogan had access to data that — had he been upfront about his intentions — Facebook would not have intended him to access. He effectively lied his way into accessing data that he shouldn't have had access to, under the guise of an academic study. Some might consider that social engineering attack.

And notably, both the Observer and the New York Times reported that copies of the data still exist, outside of Facebook's control.

It may not be a breach in a strict technical sense — but it's almost certainly a breach of trust, at the very least.

The CEO of Cambridge Analytica, Alexander Nix, speaks during the Web Summit, Europe's biggest tech conference, in Lisbon, Portugal, Nov. 9, 2017. (Pedro Nunes/Reuters)

There's only so much a user can do

It's worth noting that not all of the data harvested was actually used to build the detailed personality profiles Cambridge Analytica sought. In fact, of the 50 million user profiles scraped, only about 30 million contained enough information to match with existing records obtained from data brokers or provided by political campaigns.

In other words, the data harvested for Cambridge Analytica was only as useful as the quality of personal information that, in the normal course of using Facebook, those users volunteered — the same sort of information that was used to target Facebook users with divisive political ads as part of a broad Russian influence campaign.

Given the fallout from both incidents, it's possible users may be less eager to be as open with Facebook going forward, given the very real risk their data may be misused — something that critics have warned of for years.

But of course, there is only so much that users can do. Users may decide to keep more information off Facebook than they have in the past, but there are other, less obvious ways the company can collect information.

Others may decide to delete Facebook altogether, or limit how often they log on — but again, the ubiquity of the service makes this impractical, not least for those for whom Facebook has become the primary mode of communication with friends or family.

It's a reality not lost on politicians, and some are already suggesting it may ultimately fall to governments to impose some form of regulation or greater oversight — rather than trust Facebook to watch over itself.

Facebook said so itself: "This was a scam — and a fraud," Grewal told the New York Times. But it was enabled by Facebook's very design.

ABOUT THE AUTHOR

Matthew Braga

Senior Technology Reporter

Matthew Braga was the senior technology reporter for CBC News, where he covered stories about how data is collected, used, and shared.