Many of us use “privacy” and “confidentiality” interchangeably, yet the two words – and “anonymity” – mean significantly different things. The differences matter when it comes to data ownership, rights, responsibilities, and protections.
I took a deep dive into confidentiality a couple of decades ago, during a four-year stint working at the U.S. Census Bureau. That was a great gig. Working on subcontract to IBM, I designed and led the development team for the 2000 Census tabulation system, which the bureau ultimately used to generate hundreds of billions of statistical tables, and I helped build the first iteration of the bureau’s American Fact Finder Web portal.
Census data is confidential. It is covered by Title 13 of the United States Code, which restricts use of population census responses to statistical purposes and establishes a penalty for wrongful disclosure of individual records. So let’s say, confidential data is data required for a particular purpose whose use is limited to that purpose, albeit possibly subject to a variety of provisions.
Every residence must respond to the Census Bureau, and each resident must be reported, including individuals in the United States illegally. One Title 13 implication is that immigration authorities may not use census data for deportations. This limited-use provision helps ensure an accurate count!
Yet census data is not private. You are required by law to respond. Contrast with medical records governed by the federal HIPAA Privacy Rule. HIPAA is the Health Insurance Portability and Accountability Act of 1996. “Individually identifiable health information…that identifies the individual or for which there is a reasonable basis to believe it can be used to identify the individual.” What makes this data private is the notion that you own it and must consent to any use or release. Organizations that hold the data are merely custodians.
“Privacy is defined as the right of an individual to keep his/her individual health information from being disclosed,” explains the University of Miami. So privacy implies confidentiality. The whole topic is complicated! Take a glance at the many use and disclosure provisions listed in the government’s Summary of the HIPAA Privacy Rule.
Now let’s move to anonymity, starting first with de-identification, which involves “generally accepted statistical and scientific principles and methods for rendering information not individually identifiable.”
Data is anonymous when the individual described is not known and can not be identified.
Sometimes anonymous data doesn’t stay anonymous. Via analysis, sleuthing, or by combining datasets, one might be able to unmask an individual’s identity. There’s always risk involved.
Rights and Ethics
As is often observed, the U.S. Constitution does not guarantee a right to privacy although the Fourth Amendment does establish “the right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures.” Outside your person and house, in public? Your movements may (generally) be legally observed and recorded; your photo taken, albeit with restrictions on commercial uses.
Each of us – individuals, organizations, and businesses – should be guided by more than just legal provisions. Ethics come into play, moral principles regarding what’s right and what’s fair. A basic rule is that you shouldn’t use information about others in ways you wouldn’t want information about yourselves to be used. Rules of this sort underlie work in Ethics in Natural Language Processing. We work toward self-imposed restrictions on the use of analytical technologies for indefensible purposes and to promote accountability and transparency. This is important work.
There’s an underlying sense of fairness here although trumped by an overriding need to know, for the public good. That need to know has its own protections, notably the Constitution’s First Amendment and whistleblower laws. That is, privacy and confidentiality are not absolutes. They are social and legal constructs.
Each of us must act responsibly. The starting point is to understand data principles that include privacy, confidentiality, and anonymity and technology’s power to both protect and subvert proper data rules.