Reproducibility: a Cinderella Problem

The reproducibility of research has been an increasingly important topic in the scholarly communication world for several years[1]. Despite the academic world’s commitment to peer-review as part of the communication ecosystem, reproducibility – which might be seen as a form of in-depth peer-review – has never been treated as seriously.

The reproducibility process – by which a piece or claim of research is validated, by being recreated by independent researchers following the methodology described in a paper – can be tedious. But there can be few of us who haven’t been frustrated by some missing detail, or partially described process[2]. For me, it’s often when I’m presented with curated data that doesn’t seem to quite match what I’d have expected to see from the raw data.

Problems relating to reproducibility are not going to be a universal experience, the different characteristics of different fields are as present here as in other aspects of research. A proof-based discipline, such as mathematics, requires a different approach from a probabilistic science, or from social sciences involving perhaps observations and conversation.

Much of the research world has an inherent bias against integrating this more rigorous method of testing results. Journals are optimized to publish new or unique works: research output – as measured by published papers – is the key data by which researchers are measured, and funds are – broadly speaking – in favor of new research. In short: reproducibility is a Cinderella problem, in need of some attention and investment if it’s to flourish.

Earlier this month, I had the pleasure of attending an NSF / IEEE organized workshop on reproducibility, “The Future of Research Curation and Research Reproducibility”. There were many intelligent and thought-provoking contributions, those that stick in my mind included presentations by Amy Friedlander (Deputy Division Director, National Science Foundation), Bernie Rous (ACM), Victoria Stodden (UIUC), my colleague Dan Valen of Figshare, Michael Forster (IEEE), Todd Toler (Wiley) and Jelena Kovacevic (Carnegie Mellon). I’m not going to attempt to summarize the event, we’ll post a link once it’s published, but I have had a number of reflections on reproducibility as a network – or system – problem, that I wanted to share. You won’t be surprised that I also have some thoughts about how we can capture this data, and develop metrics in the space.

Reproducibility is complex – it means many things to many people

We lack a coherent concept of reproducibility: it’s as complex as anything else you might expect to find in the research world. I’m going to use a strawman example in this blog post: the simple availability of data. However, this is just an example – even data issues are multifaceted. Are we discussing raw data, or curated? Or the curation process? What are the ethical, privacy and licensing concerns about the various forms of data? How is the data stored, and protected? If a finding fails to be reproduced because of a referenced value, how does this affect the status of this particular paper?

A reproducibility ecosystem

1. A reproducibility statement

It should be possible for researchers to formally express the steps that they have undertaken to make an experiment or a paper reproducible. For example, “The complete data set is available at (address)”, “The curated data is available at (address), and the raw data is available on request”, “The data used in this experiment contains private information and is not available.” Note that there’s no sense of obligation in this process: it’s simply a structural device to support the structured communication of the intentions of the authors. The statement could be embedded in the methodology section of a paper.

2. Identifying reproducibility

One relatively simple form of reproducibility would be to test the above statement. Although this shouldn’t be the limits of reproducibility, even the simple process of making a statement about what has been done to support reproducibility enables a straightforward task – that of confirming the author’s statement. The benefits of this explicit stage is that it could be embedded in the existing peer-review and publishing process.

Bernie Rous of the ACM presented a process like this at the NSF / IEEE workshop[3]. In this case, the publisher supports a structured approach to confirming reproducibility, and displays the results by the use of badges. LINK. These badges are related to elements in the TOPS project / documentation(?), and this could be used as a general purpose taxonomy to support reproducibility statements, the actual elements being selected by journal editors for relevance and appropriateness.

3. Embedding reproducibility

Research output does not live in a single place: it’s common to have several versions of the full-text available in different venues. Titles, abstracts – and increasingly references – are being fed to many systems. The infrastructure to support embedded metadata is mature: DOIs are ubiquitous, ORCID iDs are increasingly appearing against research output: CrossRef has millions of documents with various open-access and other licenses, described in machine-readable data), funding information and text-mining licenses, DataCite maintains open lists of linked data and articles. Whilst introducing a standard for describing reproducibility, potentially based on the TOPS guidelines[4] and FORCE11 work on related principles[5], wouldn’t be trivial, the process of developing standards and sharing data is something the community understands and supports.

4. Securing reproducibility

Merely making the data available at the time of publishing is not the end of the data storage problem: the question of where and how the data is stored has to be addressed. FORCE11’s Data Citation Principles[6] described some steps needed to promote data as a first class research object. These include metadata, identifiers and other elements. FORCE11 is currently engaged in implementation projects that are supported by some of the world’s biggest organizations in the research environment.

Probably the most important issue is understanding how long data will be secured for, and what arrangements are made to guarantee this security. Repositories can be certified, by a stepped process.[7]

5. Funding reproducibility

Even if we adapt our current processes to thoroughly support reproducibility, we haven’t addressed the issue of who is to fund it. It has to be observed that many of the agencies involved in pushing for reproducibility are funding agencies, and to this end, I would call upon them, firstly, to invest in the structural changes needed, and secondly, to develop a nuanced view of the problem.

I identified earlier that different fields have different needs, and it is obviously true to say that different topics have different senses of urgency. That sense of urgency – the need to verify research findings – could very well be a driver for reproducibility. This could be determined at the time of publishing, or alternatively decided at periods afterwards. Making predictions about citation rates for individual papers is notoriously difficult: if, over the course of a year or two, it appears that a paper is being used as a foundation stone for future research, then that might highlight the need for verification. This would be all the more true if the findings were unique to that piece of research.

In both cases, it would be possible to pre-define rules that – once reached – could unlock related funding. A funding agency could include a proportion of money that would be held back to fund reproducibility should these thresholds of importance or use be reached. By limiting the degree to which findings need to be reproduced, and by focussing the need, it should be possible to increase the efficiency of research – by increasing the certainty of reproduced claims, and by reducing incorrect dependencies on research that couldn’t be reproduced.

6. Publishing reproducibility

At present, publishers do not often support the publishing of papers that reproduce others. If reproducibility is to be taken seriously, the outputs must become part of the scholarly record, with researchers able to claim this work as part of their output.

I have to be mindful that publishers will not want to damage their journal metrics, however! It is unlikely that papers that describe a reproduced experiment would be cited often, and widespread publication of such papers would both tie up journal staff and also ‘damage’ their metrics. I have two relevant ideas to share about how reproducibility output could be incorporated into the publishing context.

Firstly, journals could publish such material as an appendix, adjunct to the journal itself. This would be particularly important if the new output acted as a correction, or meaningful addition to the original paper.

Secondly, reproduced work that doesn’t meaningfully add work could be presented as an annotation to the original paper, in the same manner in which a service such as Publons allows for open annotation, review and linking to papers.

Both routes could use the same metadata standards as described earlier in this document: importantly, the role of authorship should be incorporated. A reproducibility statement that is made by an author, and verified by a peer-review needs to be distinguished from a third-party annotation on an open platform. Nevertheless, this distinction can be incorporated in the metadata.

Needless to say, there is a cost distinction between the two paths. Journals, and their editing and content processes, have a direct cost associated with them. Services, such as Publons, are frequently free at the point of cost.

By incorporating the correct metadata and authorship relations, authorship of reproduced research can be credited to the researchers, providing all important currency to those researchers and institutions. This recognition both rewards the work, and validates reproducibility as a primary research task: it may encourage early stage researchers to go the extra distance and get rewards for their work in their field.

7. Measuring reproducibility

A standardized way of collecting the elements of reproducibility and communicating those facts means that we can count and measure reproducibility. Echoing my earlier observation that all reproducibility is not relevant for all research, this would allow funders, institutions and journals to measure the degree to which reproducibility is being adopted. Reproducibility is not a simple binary process: the greater the degree that reproducibility has been undertaken (with success), the higher the likelihood that the findings can be treated as verified.

Conclusion

This suggested ecosystem describes a way that efforts can be used focus need, to discover reproducibility, to reward these efforts: to suggest ways in which the various members of the scholarly environment can adapt their citizenship roles to support the future success of reproducibility.

Reproducibility can look like one large problem, but the reality is that it is a number of issues, which can be seen as being distributed throughout the environment. We need to recognize and reward the work that has already been done – by funders, service providers, agencies such as the RDA and publishers, and to plan for a joined up future that fully enables reproducibility throughout the scholarly ecosystem.

Thanks to Dan Valen and Simon Porter for suggestions and corrections.

[1] Vasilevsky NA, Minnier J, Haendel MA, Champieux RE. (2016) Reproducible and reusable research: Are journal data sharing policies meeting the mark? PeerJ Preprints 4:e2588v1 https://doi.org/10.7287/peerj.preprints.2588v1

[2] Vasilevsky NA, Brush MH, Paddock H, Ponting L, Tripathy SJ, LaRocca GM, Haendel MA. (2013) On the reproducibility of science: unique identification of research resources in the biomedical literature. PeerJ 1:e148 https://doi.org/10.7717/peerj.148

[3] ACM, (2016). Result and Artifact Review and Badging

[4] The Transparency and Openness Promotion Guidelines https://cos.io/top/

[5] Guiding Principles for Findable, Accessible, Interoperable and Re-usable Data Publishing https://www.force11.org/fairprinciples

[6] FORCE11 has a number of active data citation projects, based around the original declaration, including implementation pilots for repositories and publishers https://www.force11.org/datacitation

[7] ICPSR’s Trusted Data Respositories certification http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/preservation/trust.html

Cross posted https://www.digital-science.com/blog/guest/reproducibility-cinderella-problem/