Because good research needs good data

The State of Open Data 2022: FAIR awareness, practice, and perceptions by discipline

Thanks to Digital Science for including a set of questions in their State of Open Data 2022 survey on FAIR data practice and perceptions. I’ve been looking at them.

Laurence Horton | 25 November 2022

It’s interesting to see just how respondents vary by research discipline in what they claim to practice, and what they think others are practicing when it comes to the FAIR principles and research data.

First though, a couple of points to keep in mind. The usual disclaimers around surveys apply: Respondents are self-selecting, there may be desirability bias at play, and how respondents comprehend different terms and questions may vary, without us being able to pick up on different understandings. Also, SoOD doesn't always repeat questions or responses in the same way from year to year, so in those cases I have adjusted wordings where there were minor differences to enable comparison. Finally: emphasis on Open Data over data sharing in questions remains infuriating. Let’s say it again, open data is sharable; not all sharable data are open.

Then there are the respondents themselves. The responses aren’t generalisable, in that they are mostly from early career (according to date of first publication), university based, and science focused individuals. Thirty-eight percent are in Asia, a third in Europe, with 13 percent in North America, and eight percent in Africa. Digital Science claim there are around 6,000 usable responses they have made available in the dataset, so it’s likely that the data have been cleaned and checked for phenomena like straight lining or duplicate responses.

State of Open Data starts by asking:

“How familiar are you with the following data principles in relation to Open Data? FAIR data principles (i.e. Findable, Accessible, Interoperable and Reusable)”.

00001a.png

Over the years this has been asked, the downward trend in respondents who have “never heard of the FAIR data principles” is pleasing. It has dropped in half over five years. Nor would it be reckless to bet that in 2023, the share “familiar with” will overtake “previously heard … but not familiar” as the largest share of responses. Great news. There’s always a “but” though.

Reasonable scepticism can be applied to what exactly this question is measuring, and if it is constant across disciplines. Disciplines with established and active FAIR initiatives and communities could have higher thresholds for “familiar” and “heard of” FAIR than other ones that haven’t had similar levels of activity.

State of Open Data 2022 does ask for a respondent’s primary area of research interest, so we can try to peek into disciplinary differences for the 2022 survey.

000016.png

The picture is, mostly, stable across groups. “I am familiar…”  has the largest standard deviation across disciplines, at 6.61 percent, driven mostly by the large share of astronomers claiming familiarity, which is great – well-done astronomers. But we are getting into small numbers of respondents for their research interest, which is likely to explain the large variation in this research interest. Earth and Environmental Science (8.9 percent above the overall share), and Social Sciences (5.6 percent), are the next highest above average shares.

“I have never heard of the FAIR data principles before now” has an overall share of 27 percent, and the Astronomers and Earth scientists are again varying the greatest from the overall share. Medicine and Arts and Humanities are slightly less informed about FAIR, with Others, Physics, and Chemistry displaying the lowest levels of awareness of FAIR principles, but all less than by one standard deviation.

The most stable response category is the “I have previously heard…”, with a standard deviation of 2.88 percent. A share of 33.7 percent means the Social Scientists come out best in this, given their above average share of claiming familiarity; Arts and Humanities less well given their higher share of having never heard of the data principles. Materials Science has the highest levels of those having heard of but not familiar with FAIR.

Familiarity is but one aspect, however. What about doing FAIR research?

State of Open Data asks:

“To what extent do you think you make your data open in compliance with: [Note: by open we mean free to access, reuse, repurpose and redistribute]: FAIR data principles (i.e. Findable, Accessible, Interoperable and Reusable)”.

Again, by framing this as “Open”, Digital Science are doing no favours by implying FAIR is an open data concept, and not a data sharing one. This is also bound to present a misleading picture of FAIR practice as researchers with data that they are unable to openly share could not answer “Entirely”. Indeed, we cannot unpick why people may be a "somewhat", for example, sharing but not free to access, or how they understand FAIR, these categories, and relate them to their own practice. It could also possibly be factor in why nearly half of respondents (44.9 percent) claim to “Somewhat” practice FAIR principle. It’s also irritating that the question categories have changed for 2022, reducing from five to four and changing the wording significantly. So, I will stick with 2022 only.

000016 (1).png

The first thing to note is the number of respondents is smaller, 4,895 to 6,104 for the familiarity question, and with an additional response category, the numbers do get small. Indeed, there’s not one Astronomer in the survey who fails to practice FAIR with their data. Overall, for “Entirely”, only three disciplines are over a standard deviation outside the average and again they have small numbers. “Somewhat” has less variation, and, again, the outliers are small number categories. “Not at all” is the most stable category, but also the smallest with only 276 total respondents across all disciplines - merely 5.6 percent. Just under a quarter of respondents are “Not aware” of FAIR principles, and by discipline this has the greatest variation. Aside from Other, Chemistry, and Arts and Humanities have the most significant lack of awareness (“I am not aware of these principles” responses at 31.8, 33.3 percent respectively). Business/Investment (18.2 percent), Earth and Environmental Science (17.3 percent) and Astronomy and Planetary Science (15 percent) have notable levels below the overall share – although the last of these has only six respondents.

000012.png

That’s the extent to which respondents claim familiarity with the FAIR principles, and the extent to which they claim to practice FAIR (albeit in an “open” framing). The final FAIR question State of Open Data 2022 asks looks beyond individual awareness and behaviour and asks about perception of colleagues’ behaviour:

“To what extent do you think researchers in your field more generally make their data open in compliance with: [Note: by open we mean free to access, reuse, repurpose and redistribute]: FAIR data principles (i.e. Findable, Accessible, Interoperable and Reusable)”

Is there a difference between self-assessed practice and perceptions? Yes and no. Respondents were roughly equal in not being aware of FAIR principles, and perception of their own “Somewhat” Open FAIR practices compared to researchers in their field. But there are significant differences between practice and perception of researchers in their field when it came to “Entirely” and “Not at all” open FAIR practices. Overall, respondents viewed themselves as practicing “Entirely” FAIR research by a much larger share than they perceived others in their area of research doing FAIR research, with the inverse true for “Not at all”. They viewed others as not practicing FAIR more than they claimed themselves not to comply with FAIR principles.

000014.png

If we take the percentage shares across research interest groups for these two questions and illustrate the difference between shares for that discipline across the two questions, we can see where the biggest disconnect between practice and perceptions occur.

000022 (1).png

A positive difference indicates that the share identifying themselves with a category is larger than the share identifying what researchers in their field do regarding FAIR. So, again, almost all have a larger share identifying with “Entirely” FAIR behaviour than believing researchers in their field are “Entirely” FAIR behaving. In nearly every other category, the share claiming a category for their own behaviour is smaller than the corresponding category for perceiving what researchers in their discipline are doing.

So what are the takeaways from this? Keeping the limitations of the survey in mind, what we see in this data is FAIR is getting through as a message, and it’s getting through to all disciplines. Within those disciplines elements of FAIR principles are being practiced, but there is an interesting disconnect between individual practice and contemporary perceptions of FAIR data.

Data:

Research, N., & Goodey, G. (2022). State of Open Data Survey 2022 additional resources (Version 1). figshare. https://doi.org/10.6084/m9.figshare.21295422.v1