The word “benchmark” has many meanings. In the context of assessing organizational performance, one often hears, “Let’s do some benchmarking,” by which is usually meant, a comparison to multiple comparable organizations. In the world of government statistics, the word “benchmark” has a meaning close to a notion of a “gold standard,” the closest to truth that is humanly possible. It is the preferred estimate whenever it is known. Differences between a given estimate and the “benchmark” estimate are sometimes used as a proxy to the lack of accuracy or bias of that given estimate.
This is pertinent to the world of the US Federal statistical system right now. The system is a collection of agencies that provide estimates of the unemployment rate, of gross domestic product, of educational attainment, demographic distributions, and a host of other statistics that provide the society with the basic picture of itself.
These statistical processes have gained their benchmark status because over the decades they are consistently collected and assembled, their procedures are well-documented, they produce information that comports with other observations, and the agencies that construct them are perceived as trustworthy, objective servants of the public.
With the emergence of high-dimensional data resources, many arising from the Internet, we now have the capability of creating statistics that measure some of the same societal phenomena as government statistics. For example, price inflation indexes can be constructed by scraping e-commerce websites. New jobs can be estimated by Web-based job openings’ services. Analysis of internet searches can construct statistics about concerns of the public on health, politics, etc.
At this point, those involved in these enterprises use the traditional government statistics to evaluate their new Web-based statistics. When the new statistics agree with the traditional, their inventors feel justified in their approach. The traditional are still the benchmarks, the “gold standard.”
There is an ongoing gradual decay of the benchmark systems because of lack of public participation in the data gathering operations and tight budgets. We might be moving into a statistical world without benchmarks. The internet-based data will be blended with traditional data in order to create new statistics as a way to support the traditional approach. Statistical models will be used to estimate quantities that had simpler descriptive characters in the earlier data world.
How can we best navigate a world where the benchmark statistics disappear? No one knows, of course, but my bet is that in the absence of a single benchmark estimate, we will prefer to have multiple estimates of the same phenomenon. Indeed, it might be useful to actively build multiple approaches to measuring some key societal trait. We would want to disseminate the resulting statistics simultaneously, seeking to see whether they all point in the same direction or indicate very different messages. With a consistent finding among them, we would have greater confidence in the signal; with disparate findings, we would be cautious.
This is a more complicated world for the public, who is accustomed to single benchmark indicators. It’s a more comfortable world for statisticians, however, who are commonly concerned about the over-simplification of individual benchmark indicators. We’ll need clear communication and user-friendly visualizations to make this new world work well.
Provost Groves, thanks for keeping the blog posts up, even during finals season. They’re always fun to read and think about. Equally important, it’s a nice way of reminding your ‘constituents’ that you really are accessible — and nice for us to get a bird’s eye view on higher ed!