In the 20th century, the quantitative social sciences blossomed. A key facilitator of that growth was the democratization of demographic and economic data through data archives. These data archives contained “anonymized” versions of original survey records of individuals, allowing a college sophomore to analyze the same data set as a full professor.
As the years went by, the archives permitted analyses of change over time in major phenomena. Indeed, they permitted quantitative history to evolve as a field. As we move into a new data world and surveys decline in frequency and importance, we need to think about whether data archives themselves will survive.
One business model of social science data archives seems to have two features: a) dues from a set of members that value access to the data, and b) contracts given to archives by data producers charged with dissemination of their data. The first feature arises a set of users from institutions whose mission includes original research (e.g., universities or research institutes). By sharing the large costs of data curation and dissemination, they give their members access to vast data resources at relatively low cost. The second feature is a function of the fact that much of the 20th century data in archives was funded by agencies funded by taxpayer money. These agencies have legal obligations to democratize their collected data.
A second business model tends to serve a set of like organization each possessing data sets that describe their activities; for example, a set of enterprises in the same manufacturing sector. In order to assess their own performance, the organizations want to know how they compare to other organizations. Are they ahead or behind their competition?
However, no organization wants their own data to be known by a competitor. In these circumstances, an honest broker organization (e.g., a trade organization) is sometimes charged with collecting all the data and providing each member a set of statistics describing all contributing organizations. Through the honest broker, the entire population of firms becomes more aware of the status of the sector, without threatening their competitive position.
The new data world is likely to be built on diverse sources of sensor, transaction, text, and movement data providing the society instant monitoring of human behavior. These are vast real time data series. Their value stems from offering timely descriptors of what’s happening minute to minute.
Yet as we increasingly enter this world, it is not at all clear whether any institution with access to such data has the mission to curate and archive such data, permitting future analyses to use them. Who will protect the streaming data of the present for users of the future?
If the society merely uses the data to monitor the present, our understanding of the past might be increasingly threatened.