It’s difficult to see any medium of information these days, whether they are print or electronic, without some commentary on what is true and what is not true. This is not a post about that issue directly, but obliquely.
First, what are we talking about: The existence of information widely disseminated in publicly available media that does not comport with other information. Some of this information is easily judged as false, with rather minor checking with more reliable sources. Others require deeper investigation to judge their veracity.
Second, some of the fastest and strongest counter-evidence of false information comes from those who have direct observation of the events that are being described. If a story claims that X occurred at Y location at Z time, but all of those present at Y location around Z time say otherwise, those present know for certain that the story is false. They directly observed whether X occurred or did not occur.
So, if this post is only “obliquely” about the post-fact world, what is it directly about?
One of the dominating forces in the proliferation of the amount of information these days is the rise of digital data sources, so-called big data, high-dimensional in extent, space, and time. The rate of increase in digital data measuring all aspects of the society and the economy is unrelenting and massive. After they are assembled and analyzed they produce information that describes our everyday lives (e.g., changes in consumer prices, nature of job vacancies, impact of education on income, attitudes toward political candidates, media consumption habits, traffic mobility).
Such information is gradually replacing traditional sources of information produced by surveys and censuses. Those sources are slower, more expensive than the data harvested from social media and consumer transaction data. However, those traditional sources were designed with a purpose in mind. The analyst of those data was quite typically part of the design group for the measurements themselves. In that sense, the producer of the information was directly related to the observation step. Good producers approached the data with healthy skepticism and perform well-designed steps of evaluation, prior to the analyses of the data that produced statistical information.
Information produced from big data is often devoid of any insight into exactly how the data were created. They are analyzed for purposes for which they were not intended. The production of “facts” from massive sets of data will be valid only if the analyst really understands the source of the observations, how they were generated, by whom, when, and for what intended reason. The “bigness” of data, absent a deep understanding of how they were produced, has little value.
We are living in an age when high speed computing can generate statistical information in seconds from very, very large digital data by analysts who know little about the data. The “facts” from these analyses require all the scrutiny that we should demand of news stories in popular media. In the extreme, the statistics from these efforts can be no more useful than knowing the mean value of 1,000,000 random digits. If we don’t know how the data were produced, by whom, when, and for what purpose, the statistics can be dangerously misleading about what’s going on in our world.
For discerning truth in popular media and discerning meaning in analyses from big data sources, a skeptical mind, searching for corroborating evidence and scrutinizing documentation of how the “facts” were generated, is critical. Those close to the observations can judge the truth better than those far away from the observations.