One of the central puzzles of the 21st century is how societies can navigate the risk of harm from use of data about individuals, on one hand, and the benefits of using data to identify problems within the larger society. The same data that can be used, with bad intentions, to harm an individual can be a component to statistical aggregates that provide insights into societal issues. For example, treating an individual differently because they were diagnosed with cancer can be harmful to the employment and life experiences of the individual. But, using that same data record combined with thousands of others to estimate the percentage of the population with similar cancer diagnoses can provide evidence for research funding allocations, for the later benefit of the full society.
The statistical uses of data construct empirical aggregates that describe large populations. In doing this, there is no need to know the individual identities of persons whose data records are used in the aggregation. The best statistical uses are those that reveal information about large populations that can be used to benefit the common good. These include identifying inequalities in education, income, employment, safety, food insecurity, environmental threats, etc.
One issue that arises in such statistical aggregations is whether the full collection of records used to produce the aggregation is a good representation of the population of interest. In most countries of the world, central government organizations are charged with statistical uses of data to inform the nation-state about its well-being on many dimensions. They typically use statistical sampling and censuses to collect the data that are then processed to create informative statistical aggregates.
A global phenomenon, especially in wealthy countries, is growing lack of participation in these surveys, threatening inaccuracies. The difficulty of pre-election polling to predict outcomes is hypothesized as one outcome of these trends. It is not uncommon that 95% of a sample fails to participate when asked to respond, requiring complicated assumptions about potential differences between respondents and nonrespondents.
Are there any ethical principles that apply to these individual decisions to participate in a data collection? Many of the key concepts of bioethics (e.g., beneficence toward the patient, autonomy of the patient in making decisions, etc.) apply to this event. That is, the burden of survey participation should not lead to direct harm to the respondent. The sample persons should be fully informed about the use of the data they would provide.
However, much of the focus of bioethical concepts focus on the body of the patient, and defenses from harm relative to the potential benefits of medical interventions. In contrast, the benefits of statistical aggregates accrue to the larger society. The direct benefits to the individual respondent are rather minimal.
Recently, attention has been given to the notion of “solidarity,” the feeling of support justified by shared attributes and goals within a group. There are related concepts of interconnected fates, reciprocating interactions, and cooperative acts repeated over time. In such groups, actions that support others in the group or the group as a whole are common. The anticipation of reciprocated acts of support and kindness reinforce such interactions. In some sense the sharing of goals is used to justify the sharing of burdens within the group.
So, notions of solidarity are evoked in addressing the question of whether patients should consent to having their medical data included in data banks for research purposes. There is generally a small chance that the patient will individually benefit from the data bank, but it can help build a better future world of health care. This is quite similar to the survey participation decision.
Can the notion of solidarity unlock solutions for lack of participation in research data collections? Unfortunately, the fragmentation of modern societies into smaller groups is widespread. The groups may indeed contain deep sense of solidarity within but little solidarity toward another group. Indeed, the solidarity within the group may be negatively related to the level of solidarity with another group in the society.
This is a key challenge when collective action seeks to benefit all in the society. So, back to the beginning, how does modern society navigate potential benefits of data to the full population? How can the benefits of statistical uses of data be evaluated relative to the privacy concerns, if little solidarity exist within the population?
Very insightful. No easy answers .