In the 20th century, the quantitative social sciences blossomed. A key facilitator of that growth was the democratization of demographic and economic data through data archives. These data archives contained “anonymized” versions of original survey records of individuals, allowing a college sophomore to analyze the same data set as a full professor.
As the years went by, the archives permitted analyses of change over time in major phenomena. Indeed, they permitted quantitative history to evolve as a field. As we move into a new data world and surveys decline in frequency and importance, we need to think about whether data archives themselves will survive.
One business model of social science data archives seems to have two features: a) dues from a set of members that value access to the data, and b) contracts given to archives by data producers charged with dissemination of their data. The first feature arises a set of users from institutions whose mission includes original research (e.g., universities or research institutes). By sharing the large costs of data curation and dissemination, they give their members access to vast data resources at relatively low cost. The second feature is a function of the fact that much of the 20th century data in archives was funded by agencies funded by taxpayer money. These agencies have legal obligations to democratize their collected data.
A second business model tends to serve a set of like organization each possessing data sets that describe their activities; for example, a set of enterprises in the same manufacturing sector. In order to assess their own performance, the organizations want to know how they compare to other organizations. Are they ahead or behind their competition?
However, no organization wants their own data to be known by a competitor. In these circumstances, an honest broker organization (e.g., a trade organization) is sometimes charged with collecting all the data and providing each member a set of statistics describing all contributing organizations. Through the honest broker, the entire population of firms becomes more aware of the status of the sector, without threatening their competitive position.
The new data world is likely to be built on diverse sources of sensor, transaction, text, and movement data providing the society instant monitoring of human behavior. These are vast real time data series. Their value stems from offering timely descriptors of what’s happening minute to minute.
Yet as we increasingly enter this world, it is not at all clear whether any institution with access to such data has the mission to curate and archive such data, permitting future analyses to use them. Who will protect the streaming data of the present for users of the future?
If the society merely uses the data to monitor the present, our understanding of the past might be increasingly threatened.
This is one of the strongest arguments for open data.
“The second feature is a function of the fact that much of the 20th century data in archives was funded by agencies funded by taxpayer money. These agencies have legal obligations to democratize their collected data.”
Legal, and perhaps a moral obligation. We need open libraries for data where we can curate and people are can access data. And it cannot be locked away online where the digital divide hides it away.
Another very thoughtful and provocative post, thank you. It’s true that social sciences “blossomed” in the 20th centrury, it’s also true that the natural sciences exploded. That explosion was due to many things, but an important component was the construction and application of data archives. Interesting parallels can be drawn between what social scientists seem to be struggling with in construction and use of their archives, vs what natural scientists figured out a half century ago. The struggles range from the technical to the philosophical. For example, just as social science data aimed at deducing human behavior can be viewed as proprietary by whomever collects it and wants to analyze it, so can a gene sequence, the chemistry of a small molecule, or a new concept in optical physics. Nonetheless, vast natural science archives abound; NCBI, UniProt, PDB, and dozens of smaller “boutique” archives (my favorite being PlasmoDB) are open and accessible to everyone from undergraduates to Professors. One of the things natural scientists learned is that fields don’t (cannot) explode if they’re not.
Best
Paul