General Factors to Consider in Evaluating Digital Repositories

The increase in the production of and reliance upon electronic resources has intensified the need for “long-lived” digital data.1 In response to this demand, a variety of repositories for digital content have emerged in recent years. Since these ventures tend to look to the library sector for support, criteria for assessing their reliability are needed. The criteria below were derived from the draft Audit Checklist for the Certification of Trustworthy Digital Repositories, developed by the Research Libraries Group and National Archives and Records Administration and from CRL’s own work on distributed print archives of journals and government documents.

  • Organization—The mission and solidity of the organization that supports the repository will affect the repository’s prospects for continuity. Repositories vary, from those created for the express purpose of preserving content for academia to those embedded within scientific, publishing, and aggregator organizations. It is important to know the extent to which preservation is integral to the parent organization’s mission, and how important the repository functions are to that organization’s revenue stream.
  • Governance and Accountability—The governance of the organization that supports the repository determines the communities whose interests will drive the activities of that repository. How accountable is the organization to the user community, and in what ways is that accountability assured? Conversely, how accountable is the organization to the producers or publishing community?
  • Content—What content is maintained by the repository and what are its critical characteristics? The extent and scope of the journal titles, databases, and other materials archived should be listed, or easily discovered, and verifiable. What mechanisms are in place to ensure the continued deposit of the listed content and prevent its withdrawal by the publisher?
  • Ingestion—Trustworthy repositories will disclose specific data on the form and functionality of the content ingested. Most archives reformat or “normalize” content in order to limit the cost of managing and migrating complex formats. Normalization may make the archived content look or behave differently than it does when delivered directly to users by producers or publishers. Clarity about the nature and degree of normalization can provide a sense of the scale of investment the library and/or the repository will have to make, if any, to provide an acceptable level of functionality in the future.
  • Technical Systems and Data Security—The most obvious indicator of the reliability of a repository is the stability and robustness of its technical infrastructure. Factors here include whether or not the repository system conforms to the Open Archives Information Systems Reference Model, to various system security requirements and standards developed in government and other domains, and whether or not the policies and methods for backup, redundancy, authentication, and distribution of functions and services are clear and conform to accepted best practices. Also important is the scalability of the system. Is the repository likely to be able to accommodate new and complex forms of content and functionality?
  • Cost Structure and Distribution—The costs of a repository can be structured and distributed in several ways, with differing implications for future costs to the library. The repository may assess the library or users a combination of initial capital fees and ongoing maintenance fees, or simply a subscription fee. Some costs might also be borne by the publisher of the archived content. While there are limits to how precisely a repository can project future fees in advance, libraries should be clear about the cost drivers (such as amount and complexity of content, frequency of migration, royalties to content publishers, etc.) and how the costs are distributed in the event of changes in those drivers.
  • Rights—Repositories should disclose documentation of the rights they hold to deliver the content in the event of failure by the producer or publisher, the duration of the grant of those rights, and whether said rights are transferable. Such documentation should be clear about what constitutes failure. Failure is often defined as when a publisher no longer offers the content, but drastic subscription price increases, the decision to make the content available only as part of a larger, prohibitively priced bundle, and similar events can also put content out of reach of libraries.
  • Results and Outputs—Longevity and performance are important indicators of the reliability of a repository. While digital preservation is only just emerging, organizations and systems that have proven histories of effectively fulfilling preservation functions are likely to continue to support persistence.

We will refine and expand these criteria as the Auditing and Certification of Digital Archives project progresses. For more information visit the CRL project Web site.