Because good research needs good data

Exploring the State of Open Data survey

Laurence Horton | 08 September 2022

Digital Science’s State of Open Data (SoOD) annual survey has been running since 2015. It gives those of us active in research support the only openly available international insight into data sharing attitudes and behaviour over time.

Digital Science publish an annual report around the survey’s findings. We drew on this as part of our desk research for the European Commission’s (DG RTD) European research data landscape study to provide an approximate baseline against which to measure similar questions on findable, accessible, interoperable, and reusable (FAIR) data and other data management practices. 

The SoOD reports aim to be concise and as such do not include visualisations for all the survey data collected. So, I’ve written R markdown files for downloading and (light) data cleaning from 2021’s SoOD and for creating basic visualisations of responses to questions in the survey. There’s also an HTML file of those visualisations.

This is just for a preliminary look at data in the survey for changes and interesting patterns, as I did not run descriptive statistics and, obviously, no inferential statistics either. But as it is openly licenced, you can.

SoOD is framed as a survey into Open Data, which, of course, are sharable data, although not all sharable data are open. This is frustrating when you see a question worded: “How familiar are you with the FAIR data principles in relation to open data?” as it potentially reinforces the mistaken belief FAIR applies only to open data.


Q: How familiar are you with the FAIR data principles in relation to open data?

But still, it is pleasing to see the decreasing share of those participants who have never heard of FAIR over the previous four years.

The natural follow-up is to ask if they do anything to make data FAIR:


Q: To what extent do you think you make your data open in compliance with FAIR?

It’s nice to see the "Not at all" trending down, as also the "Neutrals", while "Somewhat" and "Very much" pick up percentage shares compared to 2018.

This post is intended to illustrate some of what you can do with open data from the survey, and the examples here are included as a taster. You might find it useful for presentations and training sessions, or as a comparison with other data sharing survey data that might be available.

My personal view is that SoOD is useful and has worth as a relevant resource however we must bear in mind that the survey isn’t a representative sample of the research population. It is skewed towards branches of natural science who are university based and often early career researchers. While previous survey waves attracted respondents primarily from Europe and North America, in recent years respondents have become increasingly global. Participant bias exists in that every participant, by definition, is the kind of person wanting to complete the survey, and that also brings potential for social desirability bias with some respondents choosing to answer questions in a manner that will be viewed favourably. It’s also self-reporting with elements of imperfect memory, and difficulty in express complex answers. Nonetheless, SoOD provides us with a valuable body of open data to help us track the gradual culture shift towards data sharing and Open Science and to highlight challenges and barriers that persist.

If you want to read more about the European research data landscape project, you can read a news article and press release about its kick-off. The final report, data, and infographics will also be available.

Horton, L. (2022). Basic data visualisations for Figshare State of Open Data 2021 survey (Version 1) [Data set]. Zenodo.

Research, N., & Goodey, G. (2021). State of Open Data Survey 2021 additional resources (Version1) [Data set]. figshare.

Science, Digital; Simons, N., Goodey, G., Hardeman, M., Clare, C., Gonzales, S., et al. (2021): The State of Open Data 2021. Digital Science. [Report]. figshare.