One of the key desirable attributes of a scientific research project is that its findings be credible. Much credibility is attained if the results of the project can be repeated by an objective other researcher.
Over the recent years, there have been a variety of incidents in research that found results of one research project not repeated by another. Congress, in a concern about whether the phenomenon might be ubiquitous, asked that a National Academy of Sciences panel address the current state of science on this issue.
A lengthy but readable panel report in response to the congressional requests, released in May, 2019, was recently supplemented by a symposium at the academies.
Across different fields, there are myriad definitions of “reproducibility” and “replicability.” The panel chose to define “reproducibility” as obtaining the same results as a completed study, using the same data and the same analytic routines (based on the same code used by the original study). In this sense, the phrase “computational reproducibility” is appropriate. The panel defined “replicability” as reaching the same conclusion as that of the original study from a later project asking the same research question, whether or not it used exactly the same methods.
Of cours, there are extreme behaviors that justifiably threaten reproducibility and replicability. Breaches of scientific integrity (e.g., falsification of data) produce appropriate lack of both reproducibility and replicability, as well they should. Further, these are deeply detrimental to the image of science and public trust in research.
A larger and more complicated set of issues involve how to a) facilitate attempts to increase reproducibility and replicability, and b) how to determine whether they have occurred when a second study is conducted.
For many domains of science, facilitating attempts at reproducibility and replicability requires a level of unusual detail of documentation on study methods, data processing, and analytic methods of a study. For reproducibility, data must be stored and documented in a fashion that others can easily access and use them. The code used to process and analyze the data must be stored and documented in a fashion that others can use it. What norms can be developed to make such documentation a normal part of research? What tools might be developed to reduce the marginal difficulty of such documentation? Who is to provide a permanent home for such material (there is no guarantee GitHub will exist 20 years from now)? Will journals be the source of such permanence? Will cooperatives like Open Science provide such a service? Will funders like the NIH and NSF provide such repositories?
How do you determine whether reproducibility and replicability has been attained once attempted? The statisticians in the symposium reminded everyone that any process based on variable inputs is subject to uncertainty of outcomes. If a second researcher attempts to replicate a study using exactly the same methods on a different set of measurement units, different results could be obtained purely from sampling variability. Hence, replicability needs to acknowledge both the uncertainty inherent in the first study and in the replication. Differences within the tolerance of sampling error are to be expected, even if precisely the same methods were used and nothing else has changed. Thus, failure to replicate needs to acknowledge the uncertainty involved in all studies, only some of which can be measured.
Finally, there was large scale questioning of whether the reward systems of funding agencies and disciplines could improve their support for attempting to reproduce or replication. Currently, young scholars are judged on the novelty of their products. Spending time attempting to replicate the findings of others is generally not as highly valued. How should professional associations support reward systems for attempts to replicate prior studies’ results?
Of course, all of these questions lead to a more general one – if the human and financial resources for research are finite, what is the optimal allocation of those resources to original research versus to reproduction and replication of prior research?
“Currently, young scholars are judged on the novelty of their products. Spending time attempting to replicate the findings of others is generally not as highly valued.”
While the young doctoral candidate is seeking a “licence to teach” (or a recognition of having the ability to innovate, in more recent decades), why not have master students and bachelor students study and master their crafts by performing the “not as highly valued” replication work? Whether with the identical data archived by the professor or with additional data collected by the student, replication work could be conducted as part of any course teaching research methods relevant to the course subject. What do you think?
A very important topic. Some additional factors are perhaps relevant. First, in today’s world, where it has become fashionable at the federal level to overtly attack science, where leadership broadcasts to the world that climate change is a hoax perpetuated by the Chinese and where FDA, NOAA and EPA scientific conclusions that take years of hard work to validate are routinely “over ruled” when they conflict with political priorities, regrettably, many scientists are justifiably concerned about the motivations behind studies of how well science is working when they are initiated at the federal level. This affects participation broadly speaking, and perhaps the statistical validity of some points. Second, perhaps relatedly, most place the origins of the supposed crisis in scientific reproducibility in the results of a survey done by Nature magazine a few years back, where the widely quoted statistic of “70 % of scientists were unable to reproduce another scientist’s data” originates. Some key caveats to consider include that this is not 70% of all scientists, it is 70% of scientists that responded (anonomously) to a study soliciting such comments, sponsored by a “for profit” publisher, not by a scientific society. “70 % of scientists” perhaps sounds better when at least one motivation is to sell more magazines. In reality, about 1500 readers of this one magazine responded, whereas there are millions of scientists on the planet, many (most ?) of whom do not make it a habit to respond to such polls. Another is that, even within this biased sampling, all fields of science are not equal with regard to the central conclusion, yet this is typically referred to as a crisis in “science”. The greatest reproducibility numbers seem to cluster in certain fields. I will not name them, suffice to say they are not chemistry, biology, physics, computer science, or mathematics. Is this because fewer folks in these fields read the magazine ? Respond to anonymous polls by for profit publishers ? Etc. Finally, what tends to be lost in the entire discussion can perhaps be described by listening to Sarah Chang play Paganini or to Yo Yo Ma play anything by Bach, followed by the question “would another violinist reproduce either performance note for note” ? Maybe a handful out there, but not many. Science is in part a creative process, and hopefully without sounding too elitist the quality of the work is often determined by the quality of the scientist. Most of us aspire to become the equivalent of a Chang or Ma in our respective disciplines, and it takes a lifetime to get moderately close. In any attempt to make science better, the entire enterprise needs examination in as comprehensive a fashion as possible.
Best
Paul