Using the metaphor of food production and consumption offers some interesting insights into efforts to use data for social good. For example, many reports speculate that over 40% of the food produced in the US is wasted; that is, not consumed by humans. Organizations like Food Rescue attempt to collect such food for distribution to those in need of food. Some of this collection is from organizations that had planned for human consumption but failed (e.g., restaurants, grocery stores).
One difference between food and data is that old data, for some uses, doesn’t spoil. Indeed, “consuming” data for one purpose doesn’t destroy them for another purpose. If there is another use for the same data, they can be used over and over.
Much of machine learning and artificial intelligence uses of data begin with a stock of data to extract prediction of some phenomenon of interest. For example, given all data known, what is the probability that a specific type of person will click on a popup internet ad? The practical use of these analyses is to price displays for ads to maximize sales of the advertising entity. Every click or failure to click then provides a new observation, which (along with any other new observations on a case) can be added into the data resources in hopes of improving the prediction at the next moment. The key feature is to predict the next state of some process, ideally in real time. Computation speed and richness of data permit such modeling to drive automobiles with enough effectiveness that driverless cars are an active endeavor.
Much academic use of data is quite different. Analyses seek understanding (e.g., to understand income dynamics of families, to measure the precursors to health conditions, to monitor the productivity changes in the economy). The use of the data attempts to gain insights into the processes that produce some phenomena (e.g., does divorce lead to economic hardship for children in families, to what extent does physical exercise prevent chronic health conditions, does implementation of new computer technology increase the dollar output of a company per employee?) The questions often involve multiple outcomes simultaneously in attempt to understand whether there are important mechanisms across different phenomena. For example, how does educational attainment affect health-related behaviors (e.g., dietary habits)? Do any effects flow through the fact that higher education groups tend to have incomes that permit access to better food options? Does education itself teach people the linkage between diet and health? Do the social environments of higher education people provide social support for enhanced physical activity and through that produce higher concern for healthy eating? Such questions are of interest from the perspectives of seeking identification of the causal connections among attributes. Such causal understanding is important is designing interventions that attempt to improve the final outcome of interest (e.g., would it be more efficient and effective to introduce healthy-eating messaging in public spaces frequented by low education groups or to launch a campaign for physical fitness?).
Some uses of data in companies are for prediction of the next observation. Hourly retail sales in a retail company can be used for staff deployment, just in time stock replacement and a variety of other management decisions. This “nowcasting” or real-time decision guidance is in sharp contrast to the theory-testing or identification of causal mechanisms for phenomena of interest. Indeed, the value of “old” data for nowcasting is minimal. On the other hand, some of these data might be reused to provide benefits of greater understanding of social processes.
Since the primary purpose of much digital data now being produced is real-time prediction, one wonders whether a data recycling movement might be usefully launched now. Just as repurposing unused food can serve users who were not the intended first uses, so too data recycling might provide social benefits for secondary purposes. Since the data from many commercial transactions with the public arise solely from actions of individual people, this data recycling notion can be viewed as a service back to those whose data records are collected by the companies. If the uses of the data were beneficial to the whole society, public trust in the company might increase as a function of their recycling efforts.
Data wastage can be reduced through data recycling for the benefit of all.
Data recycling might need to be incentivized by being monetized, i.e. a market created for used data (with proper privacy protection safeguards instituted, of course).