Because good research needs good data

PREMIS Data Dictionary

By Sarah Higgins, Aberystwyth University 

Published: February 2007

1. Metadata Standards and Digital Curation

Metadata is the backbone of digital curation. Without it a digital resource may be irretrievable, unidentifiable or unusable. Metadata is descriptive or contextual information which refers to or is associated with another object or resource. This usually takes the form of a structured set of elements which describe the information resource and assists in the identification, location and retrieval of it by users, while facilitating content and access management. Metadata standards formalise the element structure to ensure that the aims of a user community can be fulfilled. More information concerning the nature of a metadata standard and how to implement one can be found in DCC Standards Watch 1: What are Metadata Standards? and DCC Standards Watch 2: Using Metadata Standards.

Back to top

2. PREMIS Data Dictionary

The Preservation Metadata: Implementation Strategies (PREMIS) international working group was set up by OCLC and RLG in 2003 to define a core set of preservation metadata elements, which could be applied broadly across the preservation community, and to examine a number of practical application issues. In 2005 the group published their final report which included version 1 of the PREMIS Data Dictionary, a metadata set for long-term digital preservation, and accompanying XML schemas, which allows PREMIS compliant metadata to be expressed consistently in XML.

The PREMIS Data Dictionary's scope is restricted to the following digital preservation activities: maintaining viability, renderability, understandability, authenticity and identity. It assumes preservation metadata will be auto-generated as much as possible and that other suitable descriptive, technical and packaging metadata standards will be used in conjunction with PREMIS.

The PREMIS Data Dictionary is rapidly gaining community acceptance and its maintenance is coordinated by the Library of Congress through a Managing Agency, an Editorial Committee and an Implementers' Group. It won the 2005 Digital Preservation Award from the Digital Preservation Coalition and the 2006 Preservation Publication Award from the Society of American Archivists'. The PREMIS Schema has been endorsed by the Metadata Encoding and Transmission Standard (METS) editorial board for use with METS.

Back to top

3. Functionality

The PREMIS Data Dictionary's data model builds on the Open Archival Information System (OAIS) Reference Model (ISO 14721), and defines relationships between five digital preservation activities or entities:

  • Intellectual Entity — a coherent unit of digital content which make up a single unit, e.g. the digitised pages of a book, or the complete set of files which make up a web page. Intellectual Entities can contain other Intellectual Entities. An Intellectual Entity can have one or more Digital Representation — the same content with different file formats, structures or functionalities, e.g. digital images in both TIFF and JPEG formats. Although defined in the data model, Intellectual Entity is regarded as out of scope for metadata specifications.
  • Objects — a discrete digital information unit. Object entities are described in three sub-types:
    • Bitstream — the bit set embedded in a file
    • File — a named and ordered sequence of bytes known by an operating system
    • Representation — the file set needed to render a complete Intellectual Entity
  • Events — metadata which provides an audit trail concerning actions by an agent on an object which is included in the preservation repository. Events can include modification of a digital object by creating a new version, creating new relationships or changes in custodianship. Events prior to ingest or after deaccessioning (the process by which an archive, museum, or library permanently removes accessioned materials from its holdings) can be recorded.
  • Agents — persons, organisations or software associated with the preservation events during a digital object's lifecycle.
  • Rights — rights and permissions statements pertaining to both digital objects and their agents.

The PREMIS Data Dictionary defines semantic units and semantic components to describe properties of the latter four entities. Eight of the semantic units defined are mandatory, along with a number of their semantic components. These are regarded as the minimum information required for the digital preservation of a digital object. A number of the other semantic components defined become mandatory if the semantic unit in which they are contained is used in the application.

Implementers are expected to use other applicable metadata standards, in conjunction with the PREMIS Data Dictionary to describe: Intellectual Entities, the characteristics of Agents, technical metadata for file formats, rights relating to access and/or distribution, details of media and hardware, the business rules of a repository and information concerning the creation of the PREMIS record. Very few values for semantic units are defined by the Data Dictionary, but the use of controlled vocabularies is recommended and the use of ISO 8601:2004 — for formatting dates is mandated.

The PREMIS XML schema is made up of individual schemas for the four entities which are in scope: Objects, Events, Agents and Rights. This allows them to be used separately and individually. A container schema is available if an implementation requires the PREMIS metadata to be kept together. At least one object must be described if the container schema is used.

The PREMIS Implementers' Group (PIG) includes a wiki to share documents, known as the Pig Pen, an implementation registry and a list group for discussion. Implementors are encouraged to share their experiences and feed these back into the ongoing revision process.

Back to top

4. Selected Implementations

  • DDA (Digital Data Archive Project), National Archives of Scotland — Ingest of digital objects from the Scottish Executive and the Scottish Courts — scheduled to go live in Autumn 2007
  • Florida Digital Archive , Florida Center for Library Automation — a preservation repository for public universities and libraries in Florida which implements most of the PREMIS data elements
  • Cairo (Complex Archive Ingest for Repository Objects) , consortium led by Oxford University Library Services — implementation of the PREMIS schema in conjunction with METS

Back to top

5. Additional Resources

Back to top

6. Related DCC Resources

Back to top