Common deidentification methods don’t fully protect data privacy: University of Chicago

University of Chicago recently published a study describing a new kind of attack called “downcoding,” demonstrating the vulnerability of a deidentified data set and sending a warning that these data transformations should not be considered sufficient to protect individuals’ privacy. University of Chicago computer scientist Aloni Cohen deals the latest decisive blow against the most popular deidentification techniques in a new paper. When datasets containing personal information are shared for research or used by companies, researchers try to disguise data – removing the final one or two digits of a zip code, for example – while still preserving its utility for insight. But while deidentification is often intended to satisfy legal requirements for data privacy, the most commonly used methods stand on shaky technical ground.

“Even by the regulatory standards, there’s a problem here,” said UChicago Computer Scientist Aloni Cohen. “Policymakers care about real world risks instead of hypothetical risks. So people have argued that the risks security and privacy researchers pointed out were hypothetical or very contrived. The goal when you’re doing that sort of technique is to redact as little as you need to guarantee a target level of anonymity. But if you achieve that goal of redacting just as little as you need, then the fact that that’s the minimum might tell you something about what was redacted. If what you want to do is take data, sanitize it, and then forget about it – put it on the web or give it to some outside researchers and decide that all your privacy obligations are done – you can’t do that using these techniques. They should not free you of your obligations to think about and protect the privacy of that data.”

By describing a new kind of attack called “downcoding,” and demonstrating the vulnerability of a deidentified data set from an online education platform, Cohen sends a warning that these data transformations should not be considered sufficient to protect individuals’ privacy. Deidentification works by redacting quasi-identifiers – information that can be put together with data from a second source to de-anonymize a data subject. Failing to account for all possible quasi-identifiers can lead to disclosures.

Related posts

New Relic Expands Presence in India with New Bengaluru Office Space to Drive Innovation and Support Growing Global Customer Demand

AWS Appoints edForce as an Authorised Training Partner to Strengthen Cloud Skill Development in India

Nxtra by Airtel Becomes First Data Centre in India to Deploy AI for Enhanced Operational Excellence

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Read More