Because good research needs good data

A conversation with the funders

The organisers of DCC’s special event, Funding Research Data Management, which took place on 25th April at Aston University, can surely claim that the essence of the debate (about what is acceptable, feasible and achievable in the use of research grant funding for the provision of data...

Graham Pryor | 30 April 2013

The organisers of DCC’s special event, Funding Research Data Management, which took place on 25th April at Aston University, can surely claim that the essence of the debate (about what is acceptable, feasible and achievable in the use of research grant funding for the provision of data infrastructure and services) has at long last been laid bare under the light of expert scrutiny. Propitiously, in what could have proved a confrontational exercise, this eagerly anticipated conversation with the funders developed into a harmonious sharing of experience, observation and explanation that, as STFC’s Juan Bicarregui observed, should lead to greater openness and collaboration between the funders and the research community.

The scene was set by Edinburgh’s Jeff Haywood, who opened the event by giving an institutional perspective, describing the university as the ‘banker of last resort’ for preserving research data. The effort required to meet this challenge is, however, considerable, where there are so many different kinds of research data, each having its own particular curation needs. In the light of this complexity, blanket statements from funders about retention pale into irrelevance when in the case of the majority of data there is little long term value and one’s real test is in deciding how much can be disposed of and when. Funding Edinburgh’s ‘repository of last resort’ currently depends on a forward budget of £2million, split 50:50 between expenditure on staff and ‘stuff’, much of the former being reskilled library, IT and research office personnel. Clearly Edinburgh has accepted its obligations but not simply as a means of meeting the various funder requirements. For Jeff, this is all about seeking a wholly win/win situation, for the postgraduate, the individual researcher, the research team and consequently for the university. It is, quite simply, what one does to be among the best.

That motivation was reflected in the presentation from Oxford Brookes, whose entrance into research data management was pursued resolutely in spite of a failure to win funding from Jisc’s Managing Research Data programme. Supported by advice from the DCC, Oxford Brookes has established (and maintains) a data management steering group, has produced a data policy and undertaken data audits. Still with no promise of sustainable funding or dedicated resource, there remains a real determination to see how what they have achieved already can be developed to provide better support and a place at the data management table with other leaders in the field.

In a joint presentation from Southampton, which covered the Uniquip equipment sharing service and the DataPool research data support service, collaboration, sharing and relevance were shown to be key. Notwithstanding its support from the research councils, which has been robust, Uniquip’s success is dependent on many factors and actors including Jisc support for technology and standards, the push by product developers to find demonstrators and the desire of most universities to secure efficiencies. This dependency on proven value to the community was echoed in the message from DataPool where, having convinced senior management of the principle that institutional investment in data management should be a good thing, the cash will not actually flow until evidence of improved value for money can be given. Hence their current focus is on answering the University Provost’s demand for concrete examples rather than generic models.

Providing evidence was a major theme in the presentation on Bristol’s pilot data.bris research data service. Once again, the development at Bristol had been on the back of existing resources (mainly library and IT), spurred on by Jisc project funding and now given institutional support until 2015. Stephen Gray’s narrative about how that support was won gave witness to the difficulties of winning institutional investment, with stories of senior management scepticism about the importance of research data management, misunderstandings about  the difference between curation and storage, and a belief that if we wait long enough the problem might just go away. Evidence from such authoritative sources as the Royal Society and the more highly regarded academic journals, together with a careful explanation of what data management actually entails, began to turn the argument in favour of the data.bris pilot, but it was not until the risks from non-investment were explained  that minds were made up. Faced with and understanding the potential loss of competitiveness when ‘if Bristol don’t manage their data the funders will put their money somewhere else’, the case was made.

It shouldn’t have been that difficult. As Ben Ryan (EPSRC) commented in the afternoon session, when the funders’ panel addressed a considerable list of questions submitted by the community, where research is undertaken using public funds, institutions have long been obliged and expected to apply high standards of research governance. This includes having good quality processes in place,  designed to safeguard and optimise such investment. In that context, research data management is but one aspect of an institution’s research governance and should not be regarded as an optional addition or something peripheral to it.

The panel comprised representatives from BBSRC, EPSRC, MRC, STFC, MRC and the Wellcome Trust. Opening it with a joint statement, Juan Bicarregui described the event as timely, since the research councils are currently working together to produce new guidance. Referring to the RCUK Common Principles on Data Policy he quoted two as particularly relevant for the day’s discussion:

  • Principle 7 – the costs of data management are indeed payable, since ‘It is appropriate to use public funds to support the management and sharing of publicly-funded research data’.  The aim of the councils is to maximise research benefit. That means using money efficiently.
  • Principle 2 – ‘Data with acknowledged long-term value should be preserved and remain accessible and usable for future research.’ Consequently, data management plans should exist, both at an institutional level and at a project level so that data with acknowledged long term value can be made accessible and reusable.  This implies that some value judgement has to be taken, as not all data should be kept. These plans should make clear to the potential funder what institutional infrastructure is being provided and what project activities are being charged against a grant, as the funders do not expect to pay for something twice (e.g. in terms of resourcing post project national data services and institutional services).

He explained that, as far as the research councils are concerned, direct costs can be used for the preparation of data for access and curation, including the assignment of metadata. The notable caveat, however, is that this applies to all that happens within the lifetime of the funded project. Beyond that window, as we are aware, some research councils provide data centres, whilst some disciplines have their own repositories.  His advice was that where these facilities exist, don’t spend money replicating services in-house.  But where an institution is going to be providing an institutional repository, costs should be met through FEC.

David Carr, on behalf of the Wellcome Trust, confirmed that most of the aforesaid also applies to them and that the Trust is open to harmonising with the RCUK funders. To clarify, Wellcome will only pay the directly incurred costs of a particular research grant, although in some instances that can be flexible and in particular cases they would be prepared to fund necessary infrastructure to allow a particular piece of research to happen. 

We returned in a way to the theme of evidence when Peter Burlinson from the BBSRC considered the question ‘which elements of research data management are allowable costs that may be included in grant proposals?’. He made the point that the onus is very much upon the individual researcher, who needs to be very clear what it is the funders are being asked to pay for through the grant. This was reiterated by the MRC’s Peter Dukes, who went so far as to say that the costs for each element of anticipated data management activity need to be listed separately. There is no rule of thumb to be used to measure the proportion of a grant that may acceptably be spent on research data management. It all depends on the kind of work to be undertaken.

But when it comes to the issue of long term storage of data, the funders’ view is less elastic, for they regard storage that is not provided by a national or discipline data centre as part of the general overhead of an HEI, something for which an institution needs to set aside funds as part of its overall service to its researchers. That said, Peter Dukes was somewhat less rigid in his interpretation of storage requirements, bearing in mind no doubt some of the longitudinal studies in which the MRC has been heavily involved. As far as Peter is concerned, the MRC would take the view that it’s horses for courses: if there’s a long term need beyond the end of a project, where the data has high value, needs to be shared and requires active curation, it may be unfair to place all of the burden on the institution.

This view was probably the most representative of the flavour of the day. There are clearly boundaries beyond which one could not reasonably expect a research funder to go, aspects of research data management that are outside the scope and tenure of a funded project. Equally, responsible research-led institutions have a duty to manage the data produced by their researchers, where this is necessary and desirable. Between them, there is room for compromise and reciprocity, but researchers must be both open and accurate in explaining their requirements before the funders can take a judgement. For example, when asked to define what the funders mean by data of acknowledged long term value, Juan Bicarregui said there is no set definition; this has to be argued on a case-by-case basis, which is why the funders ask researchers to outline their own decisions about which data is going to be kept and which is not.

Moving from differences between cases to differences that exist between funders, it was acknowledged that this can make the process of bid writing somewhat opaque. BBSRC, for example, will assess the data management case separately from the science case, so a bid won’t be rejected on the basis of a poor data management case, whereas the MRC boards have been advised they may reject a bid on the basis of the data management case. But even then, a large bid with a high proportion of the money allocated to the research team’s own analysis, with some less well defined data management activity included, would be highly likely to succeed; conversely, in the case of a small grant application it is probable the bid would be returned with suggestions for improvement. Unfortunately, as was stated with general agreement from the floor, these differences and inconsistencies can make life very difficult for institutions and will lead inevitably to rumours about what can and can’t be included in a bid.

In conclusion, the panel was in agreement that the further guidance being worked out at an RCUK level had been given greater impetus and urgency by the day’s discussion, which will certainly inform their deliberations. It is planned that this new guidance will be tested with individual discipline communities and amongst the research community at large. Whilst not all of the thirty prepared questions had been addressed, we were left with a feeling that the air had been cleared and a greater sense of open dialogue had been initiated, with many of the myths and misapprehensions on both sides seeming to have been swept away.

It was a good beginning.

A written response to all thirty questions will be provided by the panel in due course and mounted on the DCC website.

(Update: the RCUK'S written response is available via their own website, i.e. here.)