Reproducibility: a Cinderella Problem

The reproducibility of research has been an increasingly important topic in the scholarly communication world for several years[1]. Despite the academic world’s commitment to peer-review as part of the communication ecosystem, reproducibility – which might be seen as a form of in-depth peer-review – has never been treated as seriously.

The reproducibility process – by which a piece or claim of research is validated, by being recreated by independent researchers following the methodology described in a paper – can be tedious. But there can be few of us who haven’t been frustrated by some missing detail, or partially described process[2]. For me, it’s often when I’m presented with curated data that doesn’t seem to quite match what I’d have expected to see from the raw data.

Problems relating to reproducibility are not going to be a universal experience, the different characteristics of different fields are as present here as in other aspects of research. A proof-based discipline, such as mathematics, requires a different approach from a probabilistic science, or from social sciences involving perhaps observations and conversation.

Much of the research world has an inherent bias against integrating this more rigorous method of testing results. Journals are optimized to publish new or unique works: research output – as measured by published papers – is the key data by which researchers are measured, and funds are – broadly speaking – in favor of new research. In short: reproducibility is a Cinderella problem, in need of some attention and investment if it’s to flourish.

Earlier this month, I had the pleasure of attending an NSF / IEEE organized workshop on reproducibility, “The Future of Research Curation and Research Reproducibility”. There were many intelligent and thought-provoking contributions, those that stick in my mind included presentations by Amy Friedlander (Deputy Division Director, National Science Foundation), Bernie Rous (ACM), Victoria Stodden (UIUC), my colleague Dan Valen of Figshare, Michael Forster (IEEE), Todd Toler (Wiley) and Jelena Kovacevic (Carnegie Mellon). I’m not going to attempt to summarize the event, we’ll post a link once it’s published, but I have had a number of reflections on reproducibility as a network – or system – problem, that I wanted to share. You won’t be surprised that I also have some thoughts about how we can capture this data, and develop metrics in the space.

Reproducibility is complex – it means many things to many people

We lack a coherent concept of reproducibility: it’s as complex as anything else you might expect to find in the research world. I’m going to use a strawman example in this blog post: the simple availability of data. However, this is just an example – even data issues are multifaceted. Are we discussing raw data, or curated? Or the curation process? What are the ethical, privacy and licensing concerns about the various forms of data? How is the data stored, and protected? If a finding fails to be reproduced because of a referenced value, how does this affect the status of this particular paper?

A reproducibility ecosystem

1. A reproducibility statement

It should be possible for researchers to formally express the steps that they have undertaken to make an experiment or a paper reproducible. For example, “The complete data set is available at (address)”, “The curated data is available at (address), and the raw data is available on request”, “The data used in this experiment contains private information and is not available.” Note that there’s no sense of obligation in this process: it’s simply a structural device to support the structured communication of the intentions of the authors. The statement could be embedded in the methodology section of a paper.

2. Identifying reproducibility

One relatively simple form of reproducibility would be to test the above statement. Although this shouldn’t be the limits of reproducibility, even the simple process of making a statement about what has been done to support reproducibility enables a straightforward task – that of confirming the author’s statement. The benefits of this explicit stage is that it could be embedded in the existing peer-review and publishing process.

Bernie Rous of the ACM presented a process like this at the NSF / IEEE workshop[3]. In this case, the publisher supports a structured approach to confirming reproducibility, and displays the results by the use of badges. LINK. These badges are related to elements in the TOPS project / documentation(?), and this could be used as a general purpose taxonomy to support reproducibility statements, the actual elements being selected by journal editors for relevance and appropriateness.

3. Embedding reproducibility

Research output does not live in a single place: it’s common to have several versions of the full-text available in different venues. Titles, abstracts – and increasingly references – are being fed to many systems. The infrastructure to support embedded metadata is mature: DOIs are ubiquitous, ORCID iDs are increasingly appearing against research output: CrossRef has millions of documents with various open-access and other licenses, described in machine-readable data), funding information and text-mining licenses, DataCite maintains open lists of linked data and articles. Whilst introducing a standard for describing reproducibility, potentially based on the TOPS guidelines[4] and FORCE11 work on related principles[5], wouldn’t be trivial, the process of developing standards and sharing data is something the community understands and supports.

4. Securing reproducibility

Merely making the data available at the time of publishing is not the end of the data storage problem: the question of where and how the data is stored has to be addressed. FORCE11’s Data Citation Principles[6] described some steps needed to promote data as a first class research object. These include metadata, identifiers and other elements. FORCE11 is currently engaged in implementation projects that are supported by some of the world’s biggest organizations in the research environment.

Probably the most important issue is understanding how long data will be secured for, and what arrangements are made to guarantee this security. Repositories can be certified, by a stepped process.[7]

5. Funding reproducibility

Even if we adapt our current processes to thoroughly support reproducibility, we haven’t addressed the issue of who is to fund it. It has to be observed that many of the agencies involved in pushing for reproducibility are funding agencies, and to this end, I would call upon them, firstly, to invest in the structural changes needed, and secondly, to develop a nuanced view of the problem.

I identified earlier that different fields have different needs, and it is obviously true to say that different topics have different senses of urgency. That sense of urgency – the need to verify research findings – could very well be a driver for reproducibility. This could be determined at the time of publishing, or alternatively decided at periods afterwards. Making predictions about citation rates for individual papers is notoriously difficult: if, over the course of a year or two, it appears that a paper is being used as a foundation stone for future research, then that might highlight the need for verification. This would be all the more true if the findings were unique to that piece of research.

In both cases, it would be possible to pre-define rules that – once reached – could unlock related funding. A funding agency could include a proportion of money that would be held back to fund reproducibility should these thresholds of importance or use be reached. By limiting the degree to which findings need to be reproduced, and by focussing the need, it should be possible to increase the efficiency of research – by increasing the certainty of reproduced claims, and by reducing incorrect dependencies on research that couldn’t be reproduced.

6. Publishing reproducibility

At present, publishers do not often support the publishing of papers that reproduce others. If reproducibility is to be taken seriously, the outputs must become part of the scholarly record, with researchers able to claim this work as part of their output.

I have to be mindful that publishers will not want to damage their journal metrics, however! It is unlikely that papers that describe a reproduced experiment would be cited often, and widespread publication of such papers would both tie up journal staff and also ‘damage’ their metrics. I have two relevant ideas to share about how reproducibility output could be incorporated into the publishing context.

Firstly, journals could publish such material as an appendix, adjunct to the journal itself. This would be particularly important if the new output acted as a correction, or meaningful addition to the original paper.

Secondly, reproduced work that doesn’t meaningfully add work could be presented as an annotation to the original paper, in the same manner in which a service such as Publons allows for open annotation, review and linking to papers.

Both routes could use the same metadata standards as described earlier in this document: importantly, the role of authorship should be incorporated. A reproducibility statement that is made by an author, and verified by a peer-review needs to be distinguished from a third-party annotation on an open platform. Nevertheless, this distinction can be incorporated in the metadata.

Needless to say, there is a cost distinction between the two paths. Journals, and their editing and content processes, have a direct cost associated with them. Services, such as Publons, are frequently free at the point of cost.

By incorporating the correct metadata and authorship relations, authorship of reproduced research can be credited to the researchers, providing all important currency to those researchers and institutions. This recognition both rewards the work, and validates reproducibility as a primary research task: it may encourage early stage researchers to go the extra distance and get rewards for their work in their field.

7. Measuring reproducibility

A standardized way of collecting the elements of reproducibility and communicating those facts means that we can count and measure reproducibility. Echoing my earlier observation that all reproducibility is not relevant for all research, this would allow funders, institutions and journals to measure the degree to which reproducibility is being adopted. Reproducibility is not a simple binary process: the greater the degree that reproducibility has been undertaken (with success), the higher the likelihood that the findings can be treated as verified.

Conclusion

This suggested ecosystem describes a way that efforts can be used focus need, to discover reproducibility, to reward these efforts: to suggest ways in which the various members of the scholarly environment can adapt their citizenship roles to support the future success of reproducibility.

Reproducibility can look like one large problem, but the reality is that it is a number of issues, which can be seen as being distributed throughout the environment. We need to recognize and reward the work that has already been done – by funders, service providers, agencies such as the RDA and publishers, and to plan for a joined up future that fully enables reproducibility throughout the scholarly ecosystem.

Thanks to Dan Valen and Simon Porter for suggestions and corrections.

[1] Vasilevsky NA, Minnier J, Haendel MA, Champieux RE. (2016) Reproducible and reusable research: Are journal data sharing policies meeting the mark? PeerJ Preprints 4:e2588v1 https://doi.org/10.7287/peerj.preprints.2588v1

[2] Vasilevsky NA, Brush MH, Paddock H, Ponting L, Tripathy SJ, LaRocca GM, Haendel MA. (2013) On the reproducibility of science: unique identification of research resources in the biomedical literature. PeerJ 1:e148 https://doi.org/10.7717/peerj.148

[3] ACM, (2016). Result and Artifact Review and Badging

[4] The Transparency and Openness Promotion Guidelines https://cos.io/top/

[5] Guiding Principles for Findable, Accessible, Interoperable and Re-usable Data Publishing https://www.force11.org/fairprinciples

[6] FORCE11 has a number of active data citation projects, based around the original declaration, including implementation pilots for repositories and publishers https://www.force11.org/datacitation

[7] ICPSR’s Trusted Data Respositories certification http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/preservation/trust.html

 

Cross posted https://www.digital-science.com/blog/guest/reproducibility-cinderella-problem/

Join Librarians, Researchers and Evaluation Professionals to Learn About Altmetrics – September 28-29

It’s been a great personal pleasure for me to have worked on the last two sell-out Altmetrics Conferences in London and Amsterdam. Thankfully, in my new role at Digital Science, my relationship with the conference continues!

At 2:AM last year, the organizers heard from many people desperate to host this year’s conference, and we’re delighted that 3:AM will be heading to South-East Europe in September. This year, the conference organizers will be making the short flight from London to Romania, where we’ll be welcoming speakers on a number of topics.

We’ve seen that the field of altmetrics is growing in its stature amongst the academic community. Funders and institutions are becoming more aware of the need to understanding the social impact of research. This thirst for knowledge has certainly been reflected in the growing number of submissions to the conference! This year, the committee has focussed on reflecting this increased status, as well as restructuring some of the sessions to make sure we hear from a diverse audience.

The European Commission (EC) has recently set up its Altmetrics Expert Group, and we’re delighted to announce that Dr Rene von Schomberg of the EC will be giving our keynote address on Thursday. In a related panel,Professor Isabella Peters – one of the members of the Expert Group – will be leading a panel on Open Science.

This years conference has a strong international theme: we have speakers from all around Europe, Zimbabwe, Ukraine, Russia, Singapore, Japan and the USA. We have one panel that is looking at Altmetrics around the World, and another looking at specific challenges facing research evaluation in Eastern Europe.

There has been a recent initiative to start looking at metrics and altmetrics for research software – a long neglected research output. Now that datasets have been recognized by many organizations as primary research outputs, what will follow? One of the key movers in this area, Daniel S Katzof the NCSA, will be talking about the importance of extending recognition of software.

A new feature for the conference this year are the lightning talks – we’ve got two sessions for five minute talks, which should provide some lively engagement. Finally, we’re going to wrap up the event with a debate on the future of altmetrics.

You can find more about the schedule on the Altmetric Conference website. You’ll see that there are two related events in the same venue – the Altmetrics 16 academic workshop takes place on September 27th, and the traditional hackday event will take part on the Friday.

If you haven’t been to Romania before, you’re going to have a real treat. Bucharest has flights to all European hubs, and it’s a city of great architecture and friendly people – late September is going to be warm and pleasant – we look forward to seeing you!

You will be able to follow conversations around the conference on the#3amconf hashtag and by following the conference on Twitter @3AMconf.

Registration for 3:AM can be made here: http://altmetricsconference.com/register-for-3am-in-bucharest/

 

Cross posted from https://www.digital-science.com/blog/events/join-librarians-researchers-evaluation-professionals-learn-altmetrics-september-28-29/ August 17, 2016

Metrics and The Social Contract: Using Numbers, Preserving Humanity

Ever since Eugene Garfield first began to analyse citation patterns in academic literature, bibliometrics and scientometrics have been highly pragmatic disciplines. By that, I mean that technological limitations have restricted measurements and analyses to what is possible, rather than what is ideal or theoretically desirable. In the post-digital era, however, technological limitations are increasingly falling away and the problem has changed. Increasingly, we’re not limited by what we can measure but are challenged with the question of what we should measure and how we should analyse it.

There are now many more potential ways to derive metrics than ever before. Cloud computing has made terabyte scale calculations affordable and fast. Cloud research and open science will accelerate this trend.

As science and the process of science becomes more open, and funders increasingly show an interest in how their money is being spent, researchers are coming under ever increasing scrutiny. As individual researchers are subjected to greater accountability, they increasingly need quantitative and qualitative tools to help them demonstrate both academic and broader societal impact. In addition to new reporting burdens, as funding becomes ever more competitive, successful researchers must predict and plan the social, economic, cultural and industrial impact of the work that they do. This new aspect of academic career progression is a large part of what’s increasingly being called ‘reputation management’.

Whatever your point of view, metrics are becoming increasingly central to a researcher’s career, and we can expect to see an increasing level of interest in how they are calculated, what they mean, and of the relevance they have. This increasing importance can only progress if we see the development of a social contract between the various stakeholders in the research metrics environment.

  • Providers need to understand that the data, analysis and visualizations they provide have a value over and beyond a simple service.
  • Funders need to be responsible in the way that they use metrics, to resist the reduction of researchers’ careers to decimal points.
  • Researchers need to learn to use metrics to enhance the narratives that they develop to describe their ambitions and careers.

This begs the question of what role commercial organizations can play in the development of new metrics to meet these new researcher needs. How can we advance their adoption, understanding, and use?

Establishing the value of a metric

It seems like there are infinite ways to calculate metrics, even from a single source. A glance at the list of the H-index variants on Wikipedia shows over a dozen variations, each of them suggesting some benefit to this widely adopted metric. The methods by which a metric acquires the value necessary for adoption vary: a commercial organisation may invest in webinars, white papers, and blogs like this one. An academic organisation will invest in outreach efforts, conferences, research and publishing.

In both cases, the value of a metric is not derived from the relevance of the data or cleverness of the calculation. Instead, the value accrues as a consequence of the intellectual capital and understanding that users invest in it.

Metrics have to be more than an elegant measure of a specific effect or combination of effects. A successful metric also needs to be highly relevant in a practical way, while also being perceived as academically valid and not a commercial exercise in self-promotion.

Whether academically or commercially-driven, those of us who work in research metrics aspire to create tools that accrue value over the course of their lifespan. The overarching goal of scientometricians everywhere is  to create novel ways of understanding the dynamics of the scholarly world.

The innovation roadmap

Up until recently, scholarly metrics have been relatively simple and citation based. As I mentioned earlier, this is primarily due to the traditional technical limitations of print publishing. It is only within the last five years that we have started to see the meaningful emergence of non-citation-based metrics and indicators of attention.

As we progress to a point when the ‘alt’ falls from ‘altmetrics’ and more complex, broader measures of impact are seen as increasingly legitimate, we will see that there are many more useful and interesting ways to measure the value academic output in order to make meaningful policy decisions.

Citation and author-based metrics are well-embedded in the scholarly environment and are central to research evaluation frameworks around the world. Their incontestable value has accrued partially as a consequence of investment in research, product development and marketing – but mostly through their adoption by the research community. New data, technologies and techniques mean that the innovation roadmap for research metrics is much more complex than we have seen up to now.

One of the greatest challenges for researchers, bibliometricians and service providers will be to create a common framework in which the so-called alternative metrics can be used alongside legacy metrics.

The lack of correlation and the growth of advanced mathematical and technological techniques supports the belief that it is necessary to use multiple metrics to interpret any phenomena. As we develop new techniques and as open science makes more text available for mining, we can expect to see a move from metrics that require interpretation to calculate impact – in all its various forms – to semantic-based metrics, that offer a clearer understanding of impact.

Open science will drive innovation

All parts of the innovation process require significant investment: not only in obvious areas, like technology, data creation and capture, but also intellectually: both to develop metrics, but more importantly to develop and test use cases . By helping people understand and adopt the new metrics, we help update the social contract between the elements of the research community.

Policies that drive open science have had an enormous impact, and will continue to do so. Much of the work that the scientometric community are contemplating has been supported by innovations such as ORCID, CrossRef’s metadata API, and the various research data initiatives. Funders who continue to drive the research environment towards increasing openness are enabling this innovation.

Given the exciting possibilities that are being facilitated by these environmental changes, we predict that the rate of innovation will accelerate over the next five years..

However much technologists and academics innovate in this space, it is absolutely clear that the value will never be realized without the development of a social contract between metric innovators, research evaluators and academics.

The work of the stakeholder community in realizing the potential of these more sophisticated, broader measures of impact is as much about supporting and developing their use and acceptance as it is about mathematical and computing power.

Ultimately, we need to remember that metrics – whether quantitative or qualitative – are numbers about humans: human stories, human ambitions. For some people, the numbers will be enough. For some, their reputation will suffice. For others, numbers might only be useful as supporting evidence in the course of a narrative.

The academic world is a diverse world, and the role of metrics, and the social contract that develops should reflect this diversity.

Cross-posted from https://www.digital-science.com/blog/perspectives/metrics-social-contract-using-numbers-preserving-humanity/, July 26, 2016

With thanks to Laura Wheeler and Phill Jones at Digital Science for their contributions

Last night I dreamt I went to Mendeley again…

June16, 2016 – a rare first, for me. The first day at a new employer. After twenty years, I have left Elsevier and joined Digital Science. I may write about why and what’s it like to work at Elsevier one day. If it were a play review, the producers might summarize it as “***(*) … highly stimulating … excellent … a unique experience …”

June 16, 2016 is another first: writing a blog post. I became aware that I now have a plan for the next twenty years of my professional life. (I say ‘professional’ because I have numerous other plans – including the MA in archaeology, taking some public post, writing some more plays…).

What’s it like to work at Elsevier … if it were a play review, the producers might summarize it as “***(*) … highly stimulating … excellent … a unique experience …”

Part of the professional plan is understanding that I’m responsible for my own reputation and profile, and understanding that I need to invest in it. Maintaining a public profile is part of this.

One of the opportunities that a major change in life provides is the opportunity for self-reflection; stripping off the Elsevier skin and standing naked in front of the … no, I’m not standing in front of a mirror … it should be deeper than that – I’m performing a mid-life autopsy, a media-morte. What animal have I become after twenty years at Elsevier?

It turns out that the Elsevier skin was … skin-deep, and easy to slough off. I had a few meetings with Kathy and Daniel and Christian and Mario at DS, I knowingly chose between “going big or going home”, and with Juliana Wood’s stimulating words ringing in my ears, shook hands and switched allegiance in a heartbeat.

“Go big or go home”

It may be the case that I’ll carry on dreaming I’m working there for years to come (I still dream about living in Catherine Street, twenty years after I moved away). I wasn’t at all surprised to dream of being in the Mendeley office last night, it was a happy place. I shall very much miss Rich Lyne, Leah Haskoylu, Ian Harvey and the developers in my technology team.

However, for all my affection for the place and the people, I cannot say that ever became particularly corporate minded, or hegemonized, and this probably had two particular consequences: my utter lack of promotion at Elsevier (at least as viewed through the lens of ‘job title’) and also how easy it was for me to feel at home with the good folk at Digital Science.

But if I have not become an embodiment of Elsevier, what have I become? Remembering back to my pre-Elsevier life, I remember how much I wanted to work in scholarly communications, how frustrating it was to be knocked back by Blackwell Scientific, Heinemann, Elsevier, OUP…

My heart was not in the promotion of stationery and office supplies

I was running a production department for a small, commercial publisher, and while I was happy enough, my heart was just not in the promotion of stationery and office supplies. My heart was in the communication of ideas, in science, in research, in academia (in its broadest sense). Moving to Elsevier (or rather, to be specific, Butterworth-Heinemann, then part of Reed Education and Professional Publishing, part of Reed Business, part of Reed Elsevier…). I had, although it has taken me twenty years to recognize it, a vocation for working in this field.

And it’s that vocation that shapes a large part of my life. It’s surely vocation that has meant I’ve been working nearly full-time for the last month, despite being officially on ‘garden leave’ and in ‘support mode’. I’ve been analysing data, reading papers. Reviewing papers, working on two conferences, developing some plans for the governance of quality indicators, had a few meetings on NISO projects… I should have been writing a paper, but I had some gardening to do, and some prehistoric trackways to explore…

Assuming that this blog doesn’t wither on the vine, it’ll become part of my professional life: part of my vocation, a place where I can express that calling. Going big.