Because good research needs good data

Consultancy - New Report on the Role of Data in AI

Thordis Sveinsdottir | 11 January 2021

The Role of Data in AI was a short consultancy project commissioned by the Data Governance Working Group of the newly founded Global Partnership for AI (GPAI), led by the DCC. The project published the final report - The Role of Data in AI - on the 7th December 2020 - which is openly available on Zenodo. 

The report is based on findings from a literature review and results from three expert workshops held with the Data Governance WG in October and November 2020. Extensive scoping and review of literature from academic, grey and government sources was carried out to illustrate the role of data in AI and highlight the current key challenges with regard to data governance, access and availability. The review was also focused on finding best practices and work currently being undertaken internationally to overcome these challenges.  

The Report in Summary

The report presents an analysis of the challenges and provides recommendations and examples of best practices to assist the Data Governance WG, as part of GPAI, in their ongoing mission to support good data governance for AI projects and systems.  The report is divided into the following main sections:

Section 2: Outlines key steps in the use of data from AI development from data collection/creation to preservation/deletion.

Section 3: Describes the main types of data that are used for AI development and how the availability of data types influences AI development. We also look at how the specific requirements of AI can play a role in the demand for certain types of data.

Section 4: Describes the important characteristics of data that influence the process or outcome of AI development. This section explores the concept of data quality and illustrates its importance for the development of relevant and unbiased AI technologies.

Section 5: Examines the impact of unequal access to datasets and use of different types of data for the creation of AI. Benefits as well as potential harms on socio-ethical, economic, environmental and legal levels are identified and discussed.

Section 6:  Discusses the impact of the law on the access to and availability of data in the creation, development and employment of AI indicating the complexity, challenges and risks of global privacy or IP legal regimes.

Section 7: Carries on the discussion of characteristics with a focus on describing data quality and data challenges in three case studies of AI development: Development of Human Language Technologies for Under-Resourced Languages, Development of AI for Pandemic Response and Use of AI in the Criminal Justice System. This section further illustrates and provides recommendations on how good data management can assist with mitigating these challenges.

Section 8: Draws on work in the previous sections to present a set of recommendations to the Data Governance Working Group on how to further data governance for AI data to drive standards around data quality, discoverability, availability and accessibility.

The DCC contact person for this report is Thordis Sveinsdottir, please contact me for any questions or comment on this report, as well as with any consultancy enquiries.