Much of the perspective on ethical treatment of persons in research flowed from the Commission for the Protection of Human Subjects of Biomedical and Behavioral Research (1974-78). The field of bioethics parallels these activities and has also influenced how many think of obligations social scientists have toward persons involved in their research. There are several key principles of bioethics:
1. Autonomy, the freedom to make one’s own decisions
2. Beneficence, good personal consequences of research participation
3. Absence of maleficence, avoidance of harm
4. Human dignity, worthiness of respect
This is a post with some thoughts about applying notions of autonomy to various types of data as part of the Internet world.
Central to modern research practice is the right of participants to determine their own participation in activities that affect their lives. This thought underlies the notion of informed consent by the “human subject.” Further, in self-report studies (e.g., surveys), the participant is free to refuse to answer each question posed. This gives control to the respondent what information is indeed collected by the researcher. Notions of autonomy have guided IRB regulations. Research approvals rest on the disclosure of information to the participant in a way that they can competently and freely weigh the costs and benefits of participation. Voluntary, but informed, participation is the goal.
It might be useful to turn this around — change the perspective from solely the human subject, whose information is in question, to one that focuses both on the participant and collector of data. It seems that under some circumstances, each actor in the transaction may have ethical obligations.
What are the ethical obligations of each of these actors in their own autonomy? I suspect the nature of the ethical framework might depend on the uses of the data provided. The most relevant dimension might be the answer to the question, “Who benefits?” and who knows about the benefits?
In many statistical surveys, the individual data provided by one participant merely form one among thousands of observations. The product of the data are statistics describing large numbers of people in the population (e.g., median income of age and gender groups; the unemployment rate; the rate of educational attainment; prevalence of health conditions).
For this first data use type, constructing statistical information that informs the full society, what are the obligations of the participant? This information is key for the informed citizenry to evaluate the state of the nation. The benefits are not personal but societal. Each of us has some obligation to the common good. In one sense, providing personal information for common good information is like voting. My individual vote has little benefit to me, but it fulfills my personal responsibility to the polity in a democracy. As a citizen I have the responsibility to join with my other citizens to select officials. In this class of statistical information the beneficiary is the full community.
A second use class moves one step further to impact on the individual participant. What is the participant’s responsibility when the data are used to build a model that predicts future behavior (e.g., the risk of mortality in actuarial estimates; the likelihood of clicking on a web page display advertisement). In these cases, like the first use class, my personal data are just one observation, which when combined with those of many others permits the estimation of statistical likelihoods. But the next step is an intervention. The estimates are used to make decisions affecting individuals (I receive or do not receive a contract for life insurance; I see an advertisement from a retailer I earlier visited).
With this type, the beneficiaries are certainly the commercial entities involved, to the extent the models are predictive. But the individual can also benefit, if the personal intervention is viewed as a good thing. Losing the opportunity for life insurance probably won’t be viewed positively (gaining the opportunity, would be). Being alerted to new goods and services of interest to me, without initiating the search for them myself, might be something of value. But in some cases, there are also common good benefits (e.g., a healthy national insurance framework; efficient distribution of products). But it seems clear, that ethical obligations toward support of the common good don’t apply here as strongly as in the first type.
For this type, what obligation does the data collector have to that person? Prior to obtaining their data they have some obligation to inform them what they will do with it. This is required in order that the individuals can make, with some autonomy, a decision weighing the costs and benefits of providing their personal data.
The third notable type of data use is individual information used for person-based intervention, often by combining data sources. Some of these are seemingly direct benefits (e.g., mashing my location data via mobile phone tracking with display of nearby restaurants, or traffic accidents). Here, personal benefits to the user are very salient. However, the crowd-sourcing of data can sometimes yield common good outcomes. Do ethical obligations to the common good prompt considering the value to the society of such information dispersion to make the full society more livable? Who decides that a world with Yelp! is better than a world without it?
Data ethics should be part of the framework of evaluating both the behavior of individuals that might provide data and those who seek the data. It seems clear, however, that different uses will imply different guidance.