Privacy is one of the central issues facing academic fields that use personal data to pursue their research.
“Privacy” is a term that unfortunately has a large set of alternative definitions. In the social sciences, the right of privacy empowers participants chosen for research data collection to refuse to reveal personal information. All of the features of informed consent, promulgated by the use of institutional research boards, are aimed at protecting such privacy rights. In the nomenclature of those fields, a person can voluntarily give up this right to privacy to participate in a research inquiry and reveal to the researcher “private” information about themselves. In return for this volunteer act, the researcher pledges to keep “confidential” the information so proffered. Thus, “privacy” and “confidentiality” are two concepts that are simultaneously used. By “confidentiality” the researcher is obliged to use the information provided by the research participant only for the research uses described to the participant. The researcher keeps the data in a protected state, under controls that prevent any other use of the information.
“Privacy” in other domains takes on a more complicated set of meanings. Take for example the unobtrusive monitoring of internet usage patterns. In this domain, there is often an “opt-in” or “opt-out” opportunity for the user (a decision which may or may not be thoughtfully taken). But the collection of data often continues indefinitely, without reminders to the user of the collection, and without any notice of its use. It seems that “privacy” in these domains includes concerns about a) whether the consent to collect the personal data is sufficiently informed, b) whether the collection of data is sufficiently obtrusive to remind the user of its collection, and c) whether the uses of the data are made manifest.
In these domains, privacy concerns resemble surveillance concerns – that an unknown other is collecting personal information without alerting the person, with the uses of the data being completely out of the control of the person (and perhaps the collector). Surveillance evokes fears of uses of the data that may harm the person. Indeed, these domains motivate quite quickly the broadest sense of privacy, as “the right to be left alone.”
Of course, the issues of privacy are soon going to become much more complex – the internet of things is promising within a few years that there will be over 55 billion devices connected to one another indirectly through the internet. They will be emitting data continuously about their own behaviors, some of which are monitoring of a person’s environment (e.g., is anyone in the house; do we need to replace the water filter; have we run out of peanut butter?) We will acquire such devices because they will reduce our day-to-day burdens. But the devices will also collectively “know” much more about individuals than is currently the case. The data they produce may or may not be totally revealed to the users. Will we care about this?
All these notions of privacy take the perspective of the human described by the data. Another perspective is that of the user of the data. This is an area that seems underdeveloped or, at the least, too-little-discussed. It needs attention because there can be uses of personal data that greatly benefit the common good. When common good outcomes are sought by analyzing person-level data, we need a well-constructed set of ethical principles – a data ethics, if you will. These would resemble those of many helping professions – medicine, law, etc. Such codes appear to be key foundations of the trust maintained by those professions.
Such codes of ethics do exist in various statistical subfields, but they are too infrequently highlighted in discussions of “big data.” The codes begin, like those in medicine, by pledging to do no individual harm to persons whose data records are part of a statistical analysis. (Indeed, statistical uses of data place no value on an individual data record because all of the products are based on aggregations of records.) They further state that fulfilling pledges of confidentiality of individual records is a foundation of the human measurement enterprise.
It is interesting to speculate about how the development of a data ethics field might help the world navigate between the promise of using new data resources for common good purposes, on one hand, and respect for individual rights of privacy, on the other.
Very important discussion. The prominence of electronic medical records comes with great concern about privacy and confidentiality. There have already been large breeches of hospital and patient records. Interesting issues will be coming and coming soon .