Metrics, feedback systems and evaluation. Text of a keynote given at a NISO virtual conference, February 2019

How did we get here? Is there a clear path forward?

The skills to understanding how well we’re performing, to reflect on the implications and to act on the insights are essential to the human state. Our ability to abstract these thoughts, and express them – whether in language or numbers – has given us the ability to communicate all of these processes through time and space; to compare, to evaluate and judge performance. Research evaluation is one small part of the human condition, and one in which that expression of performance is increasingly communicated through the abstraction of numbers. But in the middle of all of these data, we mustn’t lose sight of the essential humanity of our endeavor, an essence that – perhaps – can’t be abstracted. While there is a path that is emerging, there are challenges to be faced in the future, balances to be found between abstracted data and human need.

    1. Humanity, from its earliest days, has always wanted to understand and control.

      The earliest artefacts that our ancestors have left us tell us about their preoccupations. At a basic level, they hunted, they ate, they drank, they cooked, they lived. But the biggest, the most substantial artefacts are those that they left to mark their lives – to celebrate their dead ancestors’ achievements – and to understand their relationships with the natural world.

      Wherever you stand on this planet, the sun dominates the seasons and the day. It tells you when to hunt, when to move, when to harvest. And up in the sky, the stars appear to rotate around us, providing information about the seasons, our location on the earth. It is not surprising that all the evidence from antiquity tells of humanity’s obsession with astral bodies. Either passively, measuring the sun’s passage through the year; or attempting to actively appease the gods to influence their behaviour and to guarantee their reappearance. All natural phenomena that affect humanity were subject to analysis through divinity. From the Bronze Age there was tradition of casting money and other offerings into water – often at crossing places and junctions, and places where flooding might a occur. It’s testimony to the resilience of tradition that despite most people not having heard of Belisima, or Achelous, or Vedenemo – three of the many gods who would have been appeased by such offerings.

      Whether good or bad, or merely decorous, habits persist. And these stretch from throwing a few cents into a wishing well, to using the H-index to compare researchers in different fields and generations. It takes real energy to change these habits, and change which is supported by theory and investigation.

 

  • In the absence of direct control; or accepted theory, humanity uses hypothesis, metaphor and abstraction to develop understanding and control.

    Our ancient ancestors would not have had the working hypotheses of gas, or rock, or gravity, or space – which we routinely invoke to explain the rise and fall of tidal waters. But as generations lived by the sea, hunted by it (and in it), depended on it, they would have known about these rhythms and the effect that the moon had on their environment. The ancient Greeks knew about this relationship, but talked about it in terms of the moon attracting the water (in an animistic manner, in other words, capriciously and knowingly). Galileo tried to offer a mechanical explanation (he didn’t like forces he didn’t understand), whereas Kepler was happy to make that conceptual jump into the unknown. It wasn’t until Newton that we had a proper, functioning theory that allowed for proper scientific work on the lunar effect on the tides. Plate tectonics is another excellent example. In my dad’s school geography textbook (from the 50s), the explanation for the evolving crust is entirely wrong. And it couldn’t have been right: the mechanisms had not been discovered – or at least they hadn’t been accepted as a consensus. As an interesting sidenote, it is suggested that the political turmoil of the mid-20th century held back this consensus by several decades.

    All of this is not to dismiss humanities’ attempts to explain cause and effect, or to measure it, or to influence it. It is to observe that in the absence of known theory, we do our best, with the tools that are available to us. We work towards a consensus – hopefully using scientific methods – and (hopefully) we achieve a broadly accepted theoretical framework.

    As an illustration about where we are, and how we got there, I want to consider two areas, both of which I feel have some relationship with how research metrics are being developed and used. The first comes from ideas of human management, or human resource management. The second comes from engineering: the science of feedback.

  • The modern roots of management theory
    Life used to be so simple. In order to motivate people, you’d show them an example of what to do, and then punish them if they didn’t raise their game. Take the example of General Drusus, whose life is immortalized in the Drusus Stone, in Germany. Aged only 30 at his death, this enormous moment supposedly celebrated his achievements at bringing peace to that part of the Roman Empire.

    In modern terms, we might present chemists with the example of (say) the Nobel Laureate Greg Winter, whose discoveries enabled modern cancer treatments using monoclonal antibodies, and founded an industry worth 100s of millions of dollar. And, having shown them the example, threatened to defund their labs if they didn’t behave appropriately.

    This may sound far-fetched, but there are analogies to be found in the present time – and in scholarly research. While I won’t mention countries – I have valued friends and colleagues – one country I am familiar with examined the importance of international collaborations in – what they felt – were comparable countries. Seeing some trends, they obliged researchers to increase the numbers of international collaborations (no matter how or why) under threat of defunding. Although the collaborations increased, it doesn’t look as if this … experiment … was particularly successful.

    However, in terms of we managing our fellow humans in commerce and industry, how we support them to develop their performance, we have – generally – come a long way since the Roman times. Albeit mostly in the last 100 years.

    It took until the beginning of the twentieth century for industry to start examining personnel management seriously. It didn’t emerge from any moral or ethical drive; rather it was pragmatically bourne by the economic and population crises that followed the world wars. It was driven by the need to rebuild countries; and to accommodate emerging labour organizations, democracy, and social ambition.

    The first formal attempts at understanding the reflexive, thoughtful human in the workplace – as compared to the “unit of production” approach of Taylorism –  were explored in the 1950s. These were based on scientific hypotheses inspired by new ideas of inheritance and inherent, unchangeable qualities. Inspired by the behavioural sciences and psychological theories of the time. The 1960s and 70s saw the introduction of more individualistic, goal-oriented approaches. For the first time – the subject, the employee, the human – became able to reflect on their own performance.

    And over the last two decades we have seen the growth of 360 appraisal. Evaluation and feedback of the individual embedded in a complex network of interdependencies.

    Personally, I remember the transition well. The start of my career was marked by line managers telling me how well I’d done. For a few years after that, I was asked how well I felt I was doing. And now, for the last few years I’m asked – how well does the company accommodate you, what can they do better. (I’m sure that’s not just at Digital Science – although it’s a great place to work!)

    A Brief History of Performance Management

    In last 100 years, then, we have come a long way. We’ve come from a combination of “be like this”, “do as i say”, via “you are what you are, and that’s a fixed quantity”, to a much more sophisticated, reflexive concept. “How do you fit in a system”, “how do we accommodate you inside this complex network”. In short, we have abandoned a STEM-like faux “scientific” approach in favour of a more discourse-focussed, human-centered, experiential process.

  • The development of feedback science
    Then the opposite trend can be observed in the fields of engineering, computer science and mathematics. The notion of a system that receives feedback of its own performance – and responds accordingly and dynamically – are concepts at the heart of any performance management system.

    The field was founded in Ancient Greek technology; it was developed by Arabic cultures and finally flourished in the industrial revolution of the 1600s. Time was always the driver – for those first 1500 years, humanity was obsessed with accurate time-keeping. In the 1600s, feedback mechanisms became essential to governing the speed of steam machines and mills, it allowed them to go from experimental novelty to essential factory equipment.

    Engineers began to use the mathematics of feedback science to predict and develop mechanisms as part of the system, rather than deploying them in an ad hoc manner, to control unruly machines. We see the genesis of hypothesis driven research in this field, rather than trial-and-error experimentation. In the 1950s, Soviet scientists made huge theoretical breakthroughs to support their space programme, and maths and computer science have combined to give us all miniaturized devices that have more positional accuracy than was conceived of, only a few years ago.

    We can see, then, two very different approaches to feedback, correction, evaluation. An approach to managing humans, that becomes more humane over the decades (and as a more dogmatic scientific approach fails to produce rewards); and an approach best suited to systems (even systems that involve humans), that takes a rigorous, theory-based approach to control.

    So how do these apply to the “business” or “industry” of research?

 

 

 

  • The growth of research as an investment business.
    I think that we have to be willing to view one of the contexts of research evaluation as part of the feedback loop of “research as a business”.

    In business, people expect a return on their investment. This might be expressed as hard cash, or notions of increased wealth, or as narratives generated by the business. Over the centuries, we have been flexible.

    As a relatively early funded researcher, Charles Babbage appears to have devoted more of his time to asking for money and explaining where it had gone, than actually working on his machines. John Harrison – who invented the first clock sufficiently accurate to compute longitude on the sea – was supported financially by the British Government, who stood to gain massively by the increased navigational efficiency of their fleet. As a side note, it’s worth observing that the Government refused to accept that he had performed adequately to merit winning the equivalent of over $3M they had established as a prize, and that Harrison had to resort to any number of tactics to maintain a financial life-line. Researcher and funder fall out over results. The sun never sets on that one.

    Today, research is a well-funded industry. Digital Science’s Dimensions application has indexed 1.4 Trillion dollars of research funding; and a vast set of outputs coming from that funding. 100 Million publications. Nearly 40 Million patents. Half a million clinical trials, a similar number of policy documents. You could be crude, and take one number – divide it by another – and come to some conclusions about the productivity, but that “analysis” would be unlikely to be helpful in any context. People have probably done worse in the pursuit of understanding research.

  • The emergence of metrics-focussed / centered research evaluation.
    According to researchers Kate Williams and Jonathan Grant, one of the most decisive steps towards a metrics-centred view of research evaluation almost happened in Australia, in 2005. This decision was explicitly based on a political commitment to strengthen links between industry and universities. The proposed Research Quality Framework focussed on the broader impact of research, as well as its quality. This plan was eventually abandoned, largely due to political change. Nevertheless, the plan was hugely influential on the UK’s proposal to replace its Research Assessment Exercise with a system based on quantitative metrics. One particular obstacle that came up (according to Williams and Grant) was the explicit “steering of researchers and universities”. However, the UK finally adopted its new framework – the Research Excellence Framework, or REF – although the impact portion was a much reduced percentage – initially set at 20%, rising to 25% in 2017.
  • The growth of metrics; the response to metrics
    Any action requires an equal and opposite reaction.
    The movement towards greater reliance on metrics to provide the feedback and evaluation components in the research cycle have inspired appropriate responses. Whether DORA, the Leiden Manifesto, or the Responsible Metrics movement, we see clear positions forming on what maybe seen as appropriate and inappropriate; responsible or irresponsible. Clearly there are a couple of candidates that often get identified as big issues. Use of the Journal Impact Factor as a way of understanding a researcher’s performance is, absolutely, numerically illiterate. The H-index is clearly biassed towards certain fields, later-stage researchers, fields with higher rates of self-citation, people who don’t take career breaks, and – therefore – men.

    For me, there is an interesting dichotomy. If we take a hypothetical example of a metric that is created between a number of citations by a number of papers, for example. The simplest way to do this is to divide the former by the latter. And that’s probably the most commonly done thing. It’s certainly well understood by the vast majority of people. And yet, it’s highly misguided. That simple math works well if you have an approximate balance between high and low cited documents. A case that simply never happens in citation data – where always have a large number of low performing documents, and a small number of high performing documents. Using such a simple, but misleading piece of maths results in us concluding that the vast majority of documents are “below average”. Which is supremely unhelpful.

    The Leiden Manifesto elegantly observes that:
    “Simplicity is a virtue in an indicator because it enhances transparency. But simplistic metrics can distort the record (see principle 7). Evaluators must strive for balance — simple indicators true to the complexity of the research process.”

    My experience is that while nearly everyone is happy with “divide one number by another”, as soon as we introduce some better mathematical practice – for example, calculate the exponential value of the arithmetic mean of the natural logs of the citations to reduce the effect of the small number of highly cited articles – peoples eyes glaze over. Even if this does result in an average value that is much “fairer” and “more responsible” than the arithmetic mean.

    Finding this balance – between accessibility and fairness – is all the more critical when it comes to considering the changing population of people who are using metrics. Every week, on various email lists, we see people posting messages akin to “Hi, I’m reasonably new to metrics, but my head of library services just made me responsible for preparing a report … and how do I start?”

    This was brought into sharp relief at a recent meeting in London, when the organizers were considering the number of institutions in the UK – 154 degree awarding bodies – versus the number of “known experts” in the field. There’s a real disparity – and you should bear in mind that the UK is probably the leader in metrics-focussed institutional analysis. Initiatives such as the LIS-Bibliometrics events, under the watchful eye of Dr Lizzie Gadd – and the Metrics Toolkit – are essential components in supporting the education and engagement of the new body of research assessment professionals. We can’t assume that the users of our metrics are now expert in the detail and background to all calculations and data.

 

  1. Research is a human endeavour.
    However, I want to move away from a debate on the detail: I know there are many experts who are going to follow me, who are very well positioned to discuss more nuanced areas of metrics. Let’s focus on a bigger question: what are we trying to achieve with research metrics and evaluation? Are there two different things going on here?

    We are engaged on a human endeavor. For example, researching Altzheimer’s. What strategies are useful; are we trying to cure, prevent, slow down, ameliorate? For widespread populations, within families or for an individual? What funding works, what drugs. What areas should be de-invested – perhaps just for the present time. Are there any effective governmental policies that can help shift the curve? For me, a key part of the work in the field of metrics and evaluation, is trying to understand the extremely complex relationships and interdepencies within topics – the human task that we have set ourselves. The fields in which we work.

    And separate to that, we have other questions – how well is a lab or a funder or a researcher or a journal performing. Questions about human performance and productivity, albeit of humans operating in a complex system. And, because they are mostly human artefacts, they are capable of reflection and change. They have ambitions and desires and motivations. The type of approach needed to support humans, and their desire to fulfil their ambitions are quite different to those needed to understand the shape, direction and dynamics of a field of knowledge.

If I am to make a prediction…
At the beginning of the presentation, I described two other areas where feedback and evaluation have been a crucial factor in the develop of human performance and system efficiency. The increasingly human-centric analysis practised by corporations in the pursuit of increased excellence; and the great theoretical, mathematical and computational breakthroughs that have revolutionalized all forms of technology.

At the current time, it feels that we are standing at a fork. On the one hand, we could use big data, network theory, advanced visualizations, AI and so on to really dig into research topics, to throw up new ideas and insights into the performance of this particular area of human society. A similar revolution to that underway in online retail, or automotive guidance.

And on the other hand, we have the increasing impact of research metrics on individual humans, and the need to be acceptable to the broadest possible slice of our community.

These two things are not the same as each other. They require different data, they offer different conclusions.

Even now, driving my fairly ordinary car, if I was presented with the data being processed by the car’s computer as it keeps the wheels spinning at an optimum rate, I would be unable to think about anything else. Dividing one number by another, or just “counting some stuff” is perfectly fine on one level. But it’s entirely inadequate to understand the nuances or trends within research.

When I think about the analytics that are possible with a modern approach to research metrics, I often think of the work of Professor Chaomei Chen at Drexel. Chaomei has been working for several years on deep analysis of the full text of research articles. His goal is to map the progress of a topic as it goes from being uncertain (“it is suggested that virus A is implicated in condition B”) to certainty (“B is caused by A”). The technological approach is heavily based on a number of theoretical approaches, which Chaomei can present using highly informative visualizations.

While these visualizations can support the qualitative statements about the role that individuals, laboratories or journals, that is not their purpose. They are designed to inform the trends, status and progression of topic-based work.

When it comes to looking at individual humans within research, I think there is another revolution that will come about.

For years, we have been accustomed to thinking that metrics are a thing that happen to researchers; or (if you work in a research office) a thing that you do to yourself. The world is changing, and the new generation of researchers will be much more aware of their own standing, their own profiles, their own strengths and their own ambitions. This is, after all, the selfie generation – and if the current massive trend towards sharing, collaboration and open access was inspired by the Napster generation – a high school graduate when Napster was launched is now in her late 30s – we are going to see a far more self-aware and self-reflective population of researchers in 20 years than we’ve been accustomed to.

The recent push towards “profiles” and the use of “baskets” (or “buckets”) of metrics is absolutely compatible with this generation, and is a start. We should be prepared for more of the same: and that includes investing in some of the concepts that we see in Human Resources (or “Talent Management” as we now see it called). For example, 360 reviews. Why shouldn’t a researcher be asking hard questions of a funders support? Or of a journal’s likelihood of promoting the research in the media? Or of the prospects of a promotion in a lab?

For my conclusion, I am extremely optimistic about the state of metrics. It seems that the conversations and movements are in the right direction – but both sides would benefit from more conversations about the purpose – and limitations – of the data driven approach.

The next challenge of open science – text of a presentation given at Latmetrics, Niteroi, Brazil (2018)

In much of the world, the past ten years have seen a revolution in how we communicate science and research.

In Europe, each year brings a new policy, designed to open research and to promote open science. Many of these policies focus on open access publishing, and open data. We have made great efforts to increase public access to the results of research.

This move towards open science is probably the biggest change in the research system since the seventeenth century, when technological innovation enabled the first mass circulation newspapers, and scholars started to engage with some parts of the wider population, to discuss philosophy and natural phenomena in the coffee shops that were opening up throughout europe.

This first renaissance brought about a permanent change in the relationship between academic pursuit, between knowledge and enlightenment. Over the centuries that followed, the need for knowledge – and the ability to use it – grew, a class of intellectuals emerged from the bourgeoise.

As science became more professional and more complex, it required more money. Social upheaval in the nineteenth and twentieth centuries, wars and revolutions, led research to become industrialised. Industrialization led to massive growth, and significant investment to place. Not only to build systems to conduct research, but also to communicate the results of research. And in many parts of the world, this investment was private, and required a business model that produced a return on that investment.

The investors in research required insights into the success of their funding: and birth of bibliometrics began to allow the first insights into the science of impact.

 

The size of the research business –  is breath-taking.

The Dimensions database of research grants details 1.3 trillion dollars of historical and current funding. This funding produces between 3 and 4 million research articles every year.

 

Much of the money that funds research comes from government. It is public money, raised by taxation, distributed by government. In many countries, it is government that is pushing for open access publishing, attempting to transform the closed academic world.

But transforming the research environment from closed to open is much more than just taking away the barriers to accessing journals.

The political world has become increasingly “popularist”. Much of this has become possible because of the growth of social media. Political messages have become simpler, less thoughtful, and much less mediated by newspaper and television journalists. In the USA and the UK – and in many countries around the world – our political leaders have become resentful and suspicious of experts and intellectuals. They have shared this with the populations, they have benefitted from it in elections.

The challenge for academics in the time of popularism and openness is to embrace the new media, to be part of the new relationships that are emerging. To ensure that the results of billions of dollars research funding are not only accessible – if you know where to look, and what words to use, and how to read research – but are comprehensible, with ambitions that are related to the needs of society.

We have to build trust, we have to learn to listen, and to speak, in ways that enable the widest possible understanding of our work.

Openness is, as we say in Britain a “double-edged sword”. It can cut in either direction. We can use it for our strengths, to build relationships. But it can also be used to tell a story of elitism, and irrelevant research. For those of us involved in research – whether it is funding, or practical, or in publishing or analysis – we need to be careful about how we communicate our impact and our research ideals.

Cleverly calculated numbers and complex analysis may be understood by us experts, but fail to engage the public. We need to become skilled at using those numbers to tell simple narratives that clearly relate the need and the purpose of academic research.

The ways in which research finds its impact on broader society needs to be expanded and understood. Whether clinical trials, industrial patents, political policy; whether the results of our research change the classroom or the factory, the office or the government, we have a responsibility to reflect on our impact and what we can do to build trust between the widest society and those of us with the good fortune to work in research and teaching. We can use this evidence to create our histories, to explain our relationship with the pursuit of knowledge and enlightenment, public health and a better society. Those of us who work in altmetrics work to discover and understand evidence of communication and impact, and to support the common understanding of the meaning and interpretation of these new, complex sources of evidence.

Our explanations of success have to go beyond the exchange of thousands of lines of data, and try to go deeper into the meaning and relevance of data. To explain science on human terms.

Much is spoken of citizen science, and the possibilities that open science and more open societies can offer, but we have only started the process of developing a deeper engagement with the wider population. This stage, our future work is the most complex, the most challenging.

In computer software, we had no need for a revolution in openness. From its earliest days, programmers were collaborating and sharing code. It was a community endeavour. Applications such as Apache and Linux – the biggest, and most complex open systems in the world – were an evolution, and not a revolution.

But it wasn’t these efforts, the efforts of the computer scientists, that finally changed the way that the world engaged with technology. Instead, it was the work of people who focussed on usability, of user-need, of interface design – whose efforts addressed the accessibility of technology, rather than the technology itself. Nowadays, we increasingly understand that we have to spend a very large fraction of our budgets on supporting users to understand the possibilities of technology, and adapting to their needs and skills.

Those of us in academic life should learn the lessons of technology. Learning to communicate our values, our results; learning to understand the needs of society and our planet are not simple things to add to a project at the last minute. Rather, they are complex and expensive, and need to be integrated into the research process.

Funders are beginning to experiment with new ideas of increasing accessibility. Last month, I heard of the innovation of “citizen juries” – in this case medical patients, and families of patients – being asked to evaluate potential projects for need and immediacy. These programmes introduce a new layer of introspection to the funding process, and demand that the community spend more time on the context of their research endeavours.

 

Open access publication is the first, and not the last step, in meeting the challenges of a more democratic, open research community. Open science is the most exciting change in the relationship between the production of knowledge and the impact of research for four hundred years, and our work has only just started.

Reproducibility: a Cinderella Problem

The reproducibility of research has been an increasingly important topic in the scholarly communication world for several years[1]. Despite the academic world’s commitment to peer-review as part of the communication ecosystem, reproducibility – which might be seen as a form of in-depth peer-review – has never been treated as seriously.

The reproducibility process – by which a piece or claim of research is validated, by being recreated by independent researchers following the methodology described in a paper – can be tedious. But there can be few of us who haven’t been frustrated by some missing detail, or partially described process[2]. For me, it’s often when I’m presented with curated data that doesn’t seem to quite match what I’d have expected to see from the raw data.

Problems relating to reproducibility are not going to be a universal experience, the different characteristics of different fields are as present here as in other aspects of research. A proof-based discipline, such as mathematics, requires a different approach from a probabilistic science, or from social sciences involving perhaps observations and conversation.

Much of the research world has an inherent bias against integrating this more rigorous method of testing results. Journals are optimized to publish new or unique works: research output – as measured by published papers – is the key data by which researchers are measured, and funds are – broadly speaking – in favor of new research. In short: reproducibility is a Cinderella problem, in need of some attention and investment if it’s to flourish.

Earlier this month, I had the pleasure of attending an NSF / IEEE organized workshop on reproducibility, “The Future of Research Curation and Research Reproducibility”. There were many intelligent and thought-provoking contributions, those that stick in my mind included presentations by Amy Friedlander (Deputy Division Director, National Science Foundation), Bernie Rous (ACM), Victoria Stodden (UIUC), my colleague Dan Valen of Figshare, Michael Forster (IEEE), Todd Toler (Wiley) and Jelena Kovacevic (Carnegie Mellon). I’m not going to attempt to summarize the event, we’ll post a link once it’s published, but I have had a number of reflections on reproducibility as a network – or system – problem, that I wanted to share. You won’t be surprised that I also have some thoughts about how we can capture this data, and develop metrics in the space.

Reproducibility is complex – it means many things to many people

We lack a coherent concept of reproducibility: it’s as complex as anything else you might expect to find in the research world. I’m going to use a strawman example in this blog post: the simple availability of data. However, this is just an example – even data issues are multifaceted. Are we discussing raw data, or curated? Or the curation process? What are the ethical, privacy and licensing concerns about the various forms of data? How is the data stored, and protected? If a finding fails to be reproduced because of a referenced value, how does this affect the status of this particular paper?

A reproducibility ecosystem

1. A reproducibility statement

It should be possible for researchers to formally express the steps that they have undertaken to make an experiment or a paper reproducible. For example, “The complete data set is available at (address)”, “The curated data is available at (address), and the raw data is available on request”, “The data used in this experiment contains private information and is not available.” Note that there’s no sense of obligation in this process: it’s simply a structural device to support the structured communication of the intentions of the authors. The statement could be embedded in the methodology section of a paper.

2. Identifying reproducibility

One relatively simple form of reproducibility would be to test the above statement. Although this shouldn’t be the limits of reproducibility, even the simple process of making a statement about what has been done to support reproducibility enables a straightforward task – that of confirming the author’s statement. The benefits of this explicit stage is that it could be embedded in the existing peer-review and publishing process.

Bernie Rous of the ACM presented a process like this at the NSF / IEEE workshop[3]. In this case, the publisher supports a structured approach to confirming reproducibility, and displays the results by the use of badges. LINK. These badges are related to elements in the TOPS project / documentation(?), and this could be used as a general purpose taxonomy to support reproducibility statements, the actual elements being selected by journal editors for relevance and appropriateness.

3. Embedding reproducibility

Research output does not live in a single place: it’s common to have several versions of the full-text available in different venues. Titles, abstracts – and increasingly references – are being fed to many systems. The infrastructure to support embedded metadata is mature: DOIs are ubiquitous, ORCID iDs are increasingly appearing against research output: CrossRef has millions of documents with various open-access and other licenses, described in machine-readable data), funding information and text-mining licenses, DataCite maintains open lists of linked data and articles. Whilst introducing a standard for describing reproducibility, potentially based on the TOPS guidelines[4] and FORCE11 work on related principles[5], wouldn’t be trivial, the process of developing standards and sharing data is something the community understands and supports.

4. Securing reproducibility

Merely making the data available at the time of publishing is not the end of the data storage problem: the question of where and how the data is stored has to be addressed. FORCE11’s Data Citation Principles[6] described some steps needed to promote data as a first class research object. These include metadata, identifiers and other elements. FORCE11 is currently engaged in implementation projects that are supported by some of the world’s biggest organizations in the research environment.

Probably the most important issue is understanding how long data will be secured for, and what arrangements are made to guarantee this security. Repositories can be certified, by a stepped process.[7]

5. Funding reproducibility

Even if we adapt our current processes to thoroughly support reproducibility, we haven’t addressed the issue of who is to fund it. It has to be observed that many of the agencies involved in pushing for reproducibility are funding agencies, and to this end, I would call upon them, firstly, to invest in the structural changes needed, and secondly, to develop a nuanced view of the problem.

I identified earlier that different fields have different needs, and it is obviously true to say that different topics have different senses of urgency. That sense of urgency – the need to verify research findings – could very well be a driver for reproducibility. This could be determined at the time of publishing, or alternatively decided at periods afterwards. Making predictions about citation rates for individual papers is notoriously difficult: if, over the course of a year or two, it appears that a paper is being used as a foundation stone for future research, then that might highlight the need for verification. This would be all the more true if the findings were unique to that piece of research.

In both cases, it would be possible to pre-define rules that – once reached – could unlock related funding. A funding agency could include a proportion of money that would be held back to fund reproducibility should these thresholds of importance or use be reached. By limiting the degree to which findings need to be reproduced, and by focussing the need, it should be possible to increase the efficiency of research – by increasing the certainty of reproduced claims, and by reducing incorrect dependencies on research that couldn’t be reproduced.

6. Publishing reproducibility

At present, publishers do not often support the publishing of papers that reproduce others. If reproducibility is to be taken seriously, the outputs must become part of the scholarly record, with researchers able to claim this work as part of their output.

I have to be mindful that publishers will not want to damage their journal metrics, however! It is unlikely that papers that describe a reproduced experiment would be cited often, and widespread publication of such papers would both tie up journal staff and also ‘damage’ their metrics. I have two relevant ideas to share about how reproducibility output could be incorporated into the publishing context.

Firstly, journals could publish such material as an appendix, adjunct to the journal itself. This would be particularly important if the new output acted as a correction, or meaningful addition to the original paper.

Secondly, reproduced work that doesn’t meaningfully add work could be presented as an annotation to the original paper, in the same manner in which a service such as Publons allows for open annotation, review and linking to papers.

Both routes could use the same metadata standards as described earlier in this document: importantly, the role of authorship should be incorporated. A reproducibility statement that is made by an author, and verified by a peer-review needs to be distinguished from a third-party annotation on an open platform. Nevertheless, this distinction can be incorporated in the metadata.

Needless to say, there is a cost distinction between the two paths. Journals, and their editing and content processes, have a direct cost associated with them. Services, such as Publons, are frequently free at the point of cost.

By incorporating the correct metadata and authorship relations, authorship of reproduced research can be credited to the researchers, providing all important currency to those researchers and institutions. This recognition both rewards the work, and validates reproducibility as a primary research task: it may encourage early stage researchers to go the extra distance and get rewards for their work in their field.

7. Measuring reproducibility

A standardized way of collecting the elements of reproducibility and communicating those facts means that we can count and measure reproducibility. Echoing my earlier observation that all reproducibility is not relevant for all research, this would allow funders, institutions and journals to measure the degree to which reproducibility is being adopted. Reproducibility is not a simple binary process: the greater the degree that reproducibility has been undertaken (with success), the higher the likelihood that the findings can be treated as verified.

Conclusion

This suggested ecosystem describes a way that efforts can be used focus need, to discover reproducibility, to reward these efforts: to suggest ways in which the various members of the scholarly environment can adapt their citizenship roles to support the future success of reproducibility.

Reproducibility can look like one large problem, but the reality is that it is a number of issues, which can be seen as being distributed throughout the environment. We need to recognize and reward the work that has already been done – by funders, service providers, agencies such as the RDA and publishers, and to plan for a joined up future that fully enables reproducibility throughout the scholarly ecosystem.

Thanks to Dan Valen and Simon Porter for suggestions and corrections.

[1] Vasilevsky NA, Minnier J, Haendel MA, Champieux RE. (2016) Reproducible and reusable research: Are journal data sharing policies meeting the mark? PeerJ Preprints 4:e2588v1 https://doi.org/10.7287/peerj.preprints.2588v1

[2] Vasilevsky NA, Brush MH, Paddock H, Ponting L, Tripathy SJ, LaRocca GM, Haendel MA. (2013) On the reproducibility of science: unique identification of research resources in the biomedical literature. PeerJ 1:e148 https://doi.org/10.7717/peerj.148

[3] ACM, (2016). Result and Artifact Review and Badging

[4] The Transparency and Openness Promotion Guidelines https://cos.io/top/

[5] Guiding Principles for Findable, Accessible, Interoperable and Re-usable Data Publishing https://www.force11.org/fairprinciples

[6] FORCE11 has a number of active data citation projects, based around the original declaration, including implementation pilots for repositories and publishers https://www.force11.org/datacitation

[7] ICPSR’s Trusted Data Respositories certification http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/preservation/trust.html

 

Cross posted https://www.digital-science.com/blog/guest/reproducibility-cinderella-problem/

Join Librarians, Researchers and Evaluation Professionals to Learn About Altmetrics – September 28-29

It’s been a great personal pleasure for me to have worked on the last two sell-out Altmetrics Conferences in London and Amsterdam. Thankfully, in my new role at Digital Science, my relationship with the conference continues!

At 2:AM last year, the organizers heard from many people desperate to host this year’s conference, and we’re delighted that 3:AM will be heading to South-East Europe in September. This year, the conference organizers will be making the short flight from London to Romania, where we’ll be welcoming speakers on a number of topics.

We’ve seen that the field of altmetrics is growing in its stature amongst the academic community. Funders and institutions are becoming more aware of the need to understanding the social impact of research. This thirst for knowledge has certainly been reflected in the growing number of submissions to the conference! This year, the committee has focussed on reflecting this increased status, as well as restructuring some of the sessions to make sure we hear from a diverse audience.

The European Commission (EC) has recently set up its Altmetrics Expert Group, and we’re delighted to announce that Dr Rene von Schomberg of the EC will be giving our keynote address on Thursday. In a related panel,Professor Isabella Peters – one of the members of the Expert Group – will be leading a panel on Open Science.

This years conference has a strong international theme: we have speakers from all around Europe, Zimbabwe, Ukraine, Russia, Singapore, Japan and the USA. We have one panel that is looking at Altmetrics around the World, and another looking at specific challenges facing research evaluation in Eastern Europe.

There has been a recent initiative to start looking at metrics and altmetrics for research software – a long neglected research output. Now that datasets have been recognized by many organizations as primary research outputs, what will follow? One of the key movers in this area, Daniel S Katzof the NCSA, will be talking about the importance of extending recognition of software.

A new feature for the conference this year are the lightning talks – we’ve got two sessions for five minute talks, which should provide some lively engagement. Finally, we’re going to wrap up the event with a debate on the future of altmetrics.

You can find more about the schedule on the Altmetric Conference website. You’ll see that there are two related events in the same venue – the Altmetrics 16 academic workshop takes place on September 27th, and the traditional hackday event will take part on the Friday.

If you haven’t been to Romania before, you’re going to have a real treat. Bucharest has flights to all European hubs, and it’s a city of great architecture and friendly people – late September is going to be warm and pleasant – we look forward to seeing you!

You will be able to follow conversations around the conference on the#3amconf hashtag and by following the conference on Twitter @3AMconf.

Registration for 3:AM can be made here: http://altmetricsconference.com/register-for-3am-in-bucharest/

 

Cross posted from https://www.digital-science.com/blog/events/join-librarians-researchers-evaluation-professionals-learn-altmetrics-september-28-29/ August 17, 2016

Metrics and The Social Contract: Using Numbers, Preserving Humanity

Ever since Eugene Garfield first began to analyse citation patterns in academic literature, bibliometrics and scientometrics have been highly pragmatic disciplines. By that, I mean that technological limitations have restricted measurements and analyses to what is possible, rather than what is ideal or theoretically desirable. In the post-digital era, however, technological limitations are increasingly falling away and the problem has changed. Increasingly, we’re not limited by what we can measure but are challenged with the question of what we should measure and how we should analyse it.

There are now many more potential ways to derive metrics than ever before. Cloud computing has made terabyte scale calculations affordable and fast. Cloud research and open science will accelerate this trend.

As science and the process of science becomes more open, and funders increasingly show an interest in how their money is being spent, researchers are coming under ever increasing scrutiny. As individual researchers are subjected to greater accountability, they increasingly need quantitative and qualitative tools to help them demonstrate both academic and broader societal impact. In addition to new reporting burdens, as funding becomes ever more competitive, successful researchers must predict and plan the social, economic, cultural and industrial impact of the work that they do. This new aspect of academic career progression is a large part of what’s increasingly being called ‘reputation management’.

Whatever your point of view, metrics are becoming increasingly central to a researcher’s career, and we can expect to see an increasing level of interest in how they are calculated, what they mean, and of the relevance they have. This increasing importance can only progress if we see the development of a social contract between the various stakeholders in the research metrics environment.

  • Providers need to understand that the data, analysis and visualizations they provide have a value over and beyond a simple service.
  • Funders need to be responsible in the way that they use metrics, to resist the reduction of researchers’ careers to decimal points.
  • Researchers need to learn to use metrics to enhance the narratives that they develop to describe their ambitions and careers.

This begs the question of what role commercial organizations can play in the development of new metrics to meet these new researcher needs. How can we advance their adoption, understanding, and use?

Establishing the value of a metric

It seems like there are infinite ways to calculate metrics, even from a single source. A glance at the list of the H-index variants on Wikipedia shows over a dozen variations, each of them suggesting some benefit to this widely adopted metric. The methods by which a metric acquires the value necessary for adoption vary: a commercial organisation may invest in webinars, white papers, and blogs like this one. An academic organisation will invest in outreach efforts, conferences, research and publishing.

In both cases, the value of a metric is not derived from the relevance of the data or cleverness of the calculation. Instead, the value accrues as a consequence of the intellectual capital and understanding that users invest in it.

Metrics have to be more than an elegant measure of a specific effect or combination of effects. A successful metric also needs to be highly relevant in a practical way, while also being perceived as academically valid and not a commercial exercise in self-promotion.

Whether academically or commercially-driven, those of us who work in research metrics aspire to create tools that accrue value over the course of their lifespan. The overarching goal of scientometricians everywhere is  to create novel ways of understanding the dynamics of the scholarly world.

The innovation roadmap

Up until recently, scholarly metrics have been relatively simple and citation based. As I mentioned earlier, this is primarily due to the traditional technical limitations of print publishing. It is only within the last five years that we have started to see the meaningful emergence of non-citation-based metrics and indicators of attention.

As we progress to a point when the ‘alt’ falls from ‘altmetrics’ and more complex, broader measures of impact are seen as increasingly legitimate, we will see that there are many more useful and interesting ways to measure the value academic output in order to make meaningful policy decisions.

Citation and author-based metrics are well-embedded in the scholarly environment and are central to research evaluation frameworks around the world. Their incontestable value has accrued partially as a consequence of investment in research, product development and marketing – but mostly through their adoption by the research community. New data, technologies and techniques mean that the innovation roadmap for research metrics is much more complex than we have seen up to now.

One of the greatest challenges for researchers, bibliometricians and service providers will be to create a common framework in which the so-called alternative metrics can be used alongside legacy metrics.

The lack of correlation and the growth of advanced mathematical and technological techniques supports the belief that it is necessary to use multiple metrics to interpret any phenomena. As we develop new techniques and as open science makes more text available for mining, we can expect to see a move from metrics that require interpretation to calculate impact – in all its various forms – to semantic-based metrics, that offer a clearer understanding of impact.

Open science will drive innovation

All parts of the innovation process require significant investment: not only in obvious areas, like technology, data creation and capture, but also intellectually: both to develop metrics, but more importantly to develop and test use cases . By helping people understand and adopt the new metrics, we help update the social contract between the elements of the research community.

Policies that drive open science have had an enormous impact, and will continue to do so. Much of the work that the scientometric community are contemplating has been supported by innovations such as ORCID, CrossRef’s metadata API, and the various research data initiatives. Funders who continue to drive the research environment towards increasing openness are enabling this innovation.

Given the exciting possibilities that are being facilitated by these environmental changes, we predict that the rate of innovation will accelerate over the next five years..

However much technologists and academics innovate in this space, it is absolutely clear that the value will never be realized without the development of a social contract between metric innovators, research evaluators and academics.

The work of the stakeholder community in realizing the potential of these more sophisticated, broader measures of impact is as much about supporting and developing their use and acceptance as it is about mathematical and computing power.

Ultimately, we need to remember that metrics – whether quantitative or qualitative – are numbers about humans: human stories, human ambitions. For some people, the numbers will be enough. For some, their reputation will suffice. For others, numbers might only be useful as supporting evidence in the course of a narrative.

The academic world is a diverse world, and the role of metrics, and the social contract that develops should reflect this diversity.

Cross-posted from https://www.digital-science.com/blog/perspectives/metrics-social-contract-using-numbers-preserving-humanity/, July 26, 2016

With thanks to Laura Wheeler and Phill Jones at Digital Science for their contributions

Last night I dreamt I went to Mendeley again…

June16, 2016 – a rare first, for me. The first day at a new employer. After twenty years, I have left Elsevier and joined Digital Science. I may write about why and what’s it like to work at Elsevier one day. If it were a play review, the producers might summarize it as “***(*) … highly stimulating … excellent … a unique experience …”

June 16, 2016 is another first: writing a blog post. I became aware that I now have a plan for the next twenty years of my professional life. (I say ‘professional’ because I have numerous other plans – including the MA in archaeology, taking some public post, writing some more plays…).

What’s it like to work at Elsevier … if it were a play review, the producers might summarize it as “***(*) … highly stimulating … excellent … a unique experience …”

Part of the professional plan is understanding that I’m responsible for my own reputation and profile, and understanding that I need to invest in it. Maintaining a public profile is part of this.

One of the opportunities that a major change in life provides is the opportunity for self-reflection; stripping off the Elsevier skin and standing naked in front of the … no, I’m not standing in front of a mirror … it should be deeper than that – I’m performing a mid-life autopsy, a media-morte. What animal have I become after twenty years at Elsevier?

It turns out that the Elsevier skin was … skin-deep, and easy to slough off. I had a few meetings with Kathy and Daniel and Christian and Mario at DS, I knowingly chose between “going big or going home”, and with Juliana Wood’s stimulating words ringing in my ears, shook hands and switched allegiance in a heartbeat.

“Go big or go home”

It may be the case that I’ll carry on dreaming I’m working there for years to come (I still dream about living in Catherine Street, twenty years after I moved away). I wasn’t at all surprised to dream of being in the Mendeley office last night, it was a happy place. I shall very much miss Rich Lyne, Leah Haskoylu, Ian Harvey and the developers in my technology team.

However, for all my affection for the place and the people, I cannot say that ever became particularly corporate minded, or hegemonized, and this probably had two particular consequences: my utter lack of promotion at Elsevier (at least as viewed through the lens of ‘job title’) and also how easy it was for me to feel at home with the good folk at Digital Science.

But if I have not become an embodiment of Elsevier, what have I become? Remembering back to my pre-Elsevier life, I remember how much I wanted to work in scholarly communications, how frustrating it was to be knocked back by Blackwell Scientific, Heinemann, Elsevier, OUP…

My heart was not in the promotion of stationery and office supplies

I was running a production department for a small, commercial publisher, and while I was happy enough, my heart was just not in the promotion of stationery and office supplies. My heart was in the communication of ideas, in science, in research, in academia (in its broadest sense). Moving to Elsevier (or rather, to be specific, Butterworth-Heinemann, then part of Reed Education and Professional Publishing, part of Reed Business, part of Reed Elsevier…). I had, although it has taken me twenty years to recognize it, a vocation for working in this field.

And it’s that vocation that shapes a large part of my life. It’s surely vocation that has meant I’ve been working nearly full-time for the last month, despite being officially on ‘garden leave’ and in ‘support mode’. I’ve been analysing data, reading papers. Reviewing papers, working on two conferences, developing some plans for the governance of quality indicators, had a few meetings on NISO projects… I should have been writing a paper, but I had some gardening to do, and some prehistoric trackways to explore…

Assuming that this blog doesn’t wither on the vine, it’ll become part of my professional life: part of my vocation, a place where I can express that calling. Going big.