Library

| Daily Briefing

Your 'anonymous' health data aren't as safe as you think


Your identity in databases might not be as secure as researchers once thought it was, according to a paper published recently in Nature Communications, in which researchers shared code that can identify nearly 100% of Americans from practically any available dataset with as few as 15 attributes, Gina Kolata writes for the New York Times.

Cheat sheets: What executives need to know about cybersecurity

Is anonymized data really anonymous?

Throughout most of the world, data are not personal and can be sold and shared without running afoul of privacy laws. However, while data are anonymized to obscure individual identities, data still contain plenty of "attributes" about a person or household, Kolata writes. 

In the paper published in Nature Communications, the researchers revealed they had developed a computer program that could identify 99.98% of Americans from nearly any available data set with as little as 15 attributes, including gender, marital status, and ZIP code.

Yves-Alexandre de Montjoye, a computer scientist at Imperial College London and lead author of the paper, said the study shows current methods of anonymizing data are insufficient. "We need to move beyond de-identification," he said. "Anonymity is not a property of a data set, but is a property of how you use it."

To share or not to share

Typically, when researchers discover a security flaw, they report the flaw to the vendor or the government, Kolata said. But in this case, anonymized data is everywhere all over the world, and all of it is at risk, de Montjoye said.

That left the researchers with a choice: Say nothing, or publish the code so data vendors can secure future data.

They decided to publish it. "This is very hard," de Montjoye said. "You have to cross your fingers that you did it properly, because once it is out there, you are never going to get it back."

Yaniv Erlich, chief scientific officer at MyHeritage, agreed with the researchers' decision. "It's always a dilemma," he said. "Should we publish or not? The consensus so far is to disclose. That is how you advance the field: Publish the code, publish the finding."

How to solve the problem

The finding raises questions about how best to protect data that are supposed to be anonymized.

One way to limit the privacy risk of anonymized data is by controlling access to the data, Kolata writes. For example, if someone wants personal data like medical records, accessing them would have to be done in a secure room where the data cannot be copied and everything that is done with the data is recorded.

According to Kamel Gadouche, CEO of C.A.S.D., a research center in France that utilizes these methods, researchers would be able to access the data remotely, but "there are very strict requirements for the room where the access point is installed."

However, this method isn't perfect, Kolata writes. If researchers want to confirm the results of a research paper for a scientific journal using that data, accessing that data would be a challenge.

Another potential solution is what's called "secure multiparty computation," Kolata writes.

"It's a cryptographic trick," Erlich said. "Suppose you want to compute the average salary for both [of] us. I don't want to tell you my salary and you don't want to tell me yours." So encrypted information is provided and decoded by a computer.

"In theory, it works great," Erlich said. However, for scientific research, the method is somewhat limited. For example, if the end result seems incorrect, "you cannot debug it, because everything is so secure you can't see the raw data," Erlich said.

Ultimately, Erlich said data gathered on people will never be entirely private. "You cannot reduce risk to zero," he said (Kolata, New York Times, 7/23).


SPONSORED BY

INTENDED AUDIENCE

AFTER YOU READ THIS

AUTHORS

TOPICS

Don't miss out on the latest Advisory Board insights

Create your free account to access 2 resources each month, including the latest research and webinars.

Want access without creating an account?

   

You have 2 free members-only resources remaining this month remaining this month.

1 free members-only resources remaining this month

1 free members-only resources remaining this month

You've reached your limit of free monthly insights

Become a member to access all of Advisory Board's resources, events, and experts

Never miss out on the latest innovative health care content tailored to you.

Benefits include:

Unlimited access to research and resources
Member-only access to events and trainings
Expert-led consultation and facilitation
The latest content delivered to your inbox

You've reached your limit of free monthly insights

Become a member to access all of Advisory Board's resources, events, and experts

Never miss out on the latest innovative health care content tailored to you.

Benefits include:

Unlimited access to research and resources
Member-only access to events and trainings
Expert-led consultation and facilitation
The latest content delivered to your inbox
AB
Thank you! Your updates have been made successfully.
Oh no! There was a problem with your request.
Error in form submission. Please try again.