An informed citizenry is a necessary condition of a successful democracy. Information about the well-being of a democracy is the foundation of an informed citizenry. Most democracies around the world have given to quasi-independent central government agencies the responsibility of producing such information. They fall under the label of “Central Bureau of Statistics” or similar names. In the US, 13 different principal statistical agencies, reporting to different cabinet level units, provide such information. Widely accepted codes of ethics and laws promote their right to assemble information regardless of whether it is favorable to the current political administration. Indeed, if such agencies output is viewed as politically influenced, the credibility of the information is threatened and trust in future actions severely depleted. Hence, statistical agencies must independently and professionally produce their information, with deep privacy protections, and distribute it freely and widely.
Over the years, statistical agencies and empirical social scientists have dominantly used statistical surveys and censuses to collect their data on people and businesses. These data are then aggregated to describe characteristics of large populations (e.g., unemployment rate, prevalence of health conditions). Such aggregations are called “statistical uses” of data to contrast with “administrative uses,” which access individual records to implement programs. However, participation in surveys has consistently declined over the years, threatening the quality of survey statistics. In reaction, scientific surveys spend more money through enhanced efforts for participation. Inflated costs lead to termination of surveys or smaller sample sizes. In short, the survey paradigm is under intense threat as a means to provide information to the public about the country’s current status.
At the same time that surveys are increasing stressed, there are today more nonsurvey digital data from local and state governments, businesses, nonprofit organizations, and federal government program agencies. These data describe behaviors in the everyday lives of Americans – their purchasing behavior, transportation use, housing experiences, educational experiences, employment behavior. In short, much of administrative digital data describe many of the activities of the US public critical to understanding the well-being of the country. Blending survey data with administrative data to create improved statistical information is attractive.
However, these growing data resources cannot currently be used to create information for the common good. They are held by the organizations that use them to run their programs or businesses, for “administrative use” purposes. Their reuse for common good “statistical uses,” to inform the US public about the status of the country, is not currently allowed.
There are real privacy concerns that argue for the highest security of individual records. People can be harmed by the misuse of their individual data. However, government statistical agencies have laws that levy large cash fines and imprisonment for such misuses and disclosure of individual data. Further, with modern computing methods, privacy protections can be greatly enhanced. Finally, statistical uses of data do not themselves require the identity of the persons described by the data.
The nation’s roads and bridges form an infrastructure for economic and social interactions. Similarly, the data within the country form an infrastructure for information extracted from them. Democracies need statistical information for their continued health. Viewing the combined data of the government, nonprofit, and private sectors as national resources in this light has merit. Colleagues and I on the Committee on National Statistics recently completed a report that sketches out a vision of a new national data infrastructure.
With the demise of surveys as a reliable tool to understand how the society is faring, serving the country’s common good need for information requires a new national mobilization of data, with modern privacy-protecting features, for common good purposes.
However, data from private sources isn’t “good enough for government work” because the data collectors and transmitters aren’t answerable to the people through their governments for the accuracy and non-partisanship (believe it or not) of their surveying and reporting methodologies. The results can easily be dismissed, as is most data from the PRC. Just as there are governments providing data that is not to be trusted, private sources might provide untrustworthy data for technical reasons (such as lack of data collection expertise, especially when it comes to wording survey questions and when it comes to having agreed-upon definitions of variables) as well as for nefarious reasons.
Interesting and important discussion. Thanks. We Need new and creative thinking, Which you and Georgetown hopefully will lead. Science and the public good.