One of the key desirable attributes of a scientific research project is that its findings be credible. Much credibility is attained if the results of the project can be repeated by an objective other researcher.
Over the recent years, there have been a variety of incidents in research that found results of one research project not repeated by another. Congress, in a concern about whether the phenomenon might be ubiquitous, asked that a National Academy of Sciences panel address the current state of science on this issue.
A lengthy but readable panel report in response to the congressional requests, released in May, 2019, was recently supplemented by a symposium at the academies.
Across different fields, there are myriad definitions of “reproducibility” and “replicability.” The panel chose to define “reproducibility” as obtaining the same results as a completed study, using the same data and the same analytic routines (based on the same code used by the original study). In this sense, the phrase “computational reproducibility” is appropriate. The panel defined “replicability” as reaching the same conclusion as that of the original study from a later project asking the same research question, whether or not it used exactly the same methods.
Of cours, there are extreme behaviors that justifiably threaten reproducibility and replicability. Breaches of scientific integrity (e.g., falsification of data) produce appropriate lack of both reproducibility and replicability, as well they should. Further, these are deeply detrimental to the image of science and public trust in research.
A larger and more complicated set of issues involve how to a) facilitate attempts to increase reproducibility and replicability, and b) how to determine whether they have occurred when a second study is conducted.
For many domains of science, facilitating attempts at reproducibility and replicability requires a level of unusual detail of documentation on study methods, data processing, and analytic methods of a study. For reproducibility, data must be stored and documented in a fashion that others can easily access and use them. The code used to process and analyze the data must be stored and documented in a fashion that others can use it. What norms can be developed to make such documentation a normal part of research? What tools might be developed to reduce the marginal difficulty of such documentation? Who is to provide a permanent home for such material (there is no guarantee GitHub will exist 20 years from now)? Will journals be the source of such permanence? Will cooperatives like Open Science provide such a service? Will funders like the NIH and NSF provide such repositories?
How do you determine whether reproducibility and replicability has been attained once attempted? The statisticians in the symposium reminded everyone that any process based on variable inputs is subject to uncertainty of outcomes. If a second researcher attempts to replicate a study using exactly the same methods on a different set of measurement units, different results could be obtained purely from sampling variability. Hence, replicability needs to acknowledge both the uncertainty inherent in the first study and in the replication. Differences within the tolerance of sampling error are to be expected, even if precisely the same methods were used and nothing else has changed. Thus, failure to replicate needs to acknowledge the uncertainty involved in all studies, only some of which can be measured.
Finally, there was large scale questioning of whether the reward systems of funding agencies and disciplines could improve their support for attempting to reproduce or replication. Currently, young scholars are judged on the novelty of their products. Spending time attempting to replicate the findings of others is generally not as highly valued. How should professional associations support reward systems for attempts to replicate prior studies’ results?
Of course, all of these questions lead to a more general one – if the human and financial resources for research are finite, what is the optimal allocation of those resources to original research versus to reproduction and replication of prior research?