The Center for Research Libraries (CRL) conducted a preservation audit of HathiTrust (www.HathiTrust.org [2]) between November 2009 and December 2010, and on the basis of that audit certified HathiTrust as a trustworthy digital repository. The CRL Certification Advisory Panel concluded that the practices and services described in HathiTrust public communications and published documentation were at the time of the audit generally sound and appropriate to the content being archived and the general needs of the CRL community. Moreover the Panel expected that in the future, HathiTrust would continue to be able to deliver content that is understandable and usable by its community.
CRL certification applies to the repository’s ability to preserve and manage digital files of books digitized by the University of Michigan, Google, and the Internet Archive, as well as the digital files generated for books digitized by other providers that conform to comparable standards.
This certification is based upon a review of extensive documentation gathered by CRL and members of its Certification Advisory Panel; data and documentation provided by HathiTrust between November 2009 and December 2010; and a site visit held in May 2010. CRL’s analysis was guided by the criteria included in the Trustworthy Repositories Audit and Certification Checklist (TRAC), and other metrics developed by CRL through its various digital repository assessment activities.
CRL conducted its audit with reference to generally accepted best practices in the management of digital systems; and with reference to the interests of its community of research libraries and the practices and needs of scholarly researchers in the humanities, sciences and social sciences in the United States and Canada. The purpose of the audit was to obtain reasonable assurance that HathiTrust provides, and is likely to continue to provide, services adequate to those needs without material flaws or defects and as described in HathiTrust’s public disclosures. The CRL audit provides a reasonable basis for these findings.
CRL assigned HathiTrust the following levels of certification (the numeric rating is based on a scale of 1 through 5, with 5 being the highest level, and 1 being the minimum certifiable level): [1]
Category |
HathiTrust Score |
Organizational Infrastructure |
2 |
Digital Object Management |
3 |
Technologies, Technical Infrastructure, Security |
4 |
The ratings reflect the existence of robust systems and sound processes in most areas, in particular in category C; and still emerging systems and processes in category A and, to a lesser extent, in category B. In the course of the audit, the Certification Advisory Panel identified several issues that CRL urges HathiTrust to address to more fully to satisfy the concerns of CRL libraries. Those issues are described in Section B of the report, Detailed Audit Findings, and pertain to specific criteria in the TRAC checklist. HathiTrust agreed to address these issues and, as a condition of continued certification, to make certain disclosures to CRL periodically. Those requirements for periodic disclosure are outlined in Section C of the report.
CRL conducted its audit with reference to :
CRL assigned to HathiTrust a level of certification in each of three categories. The numeric rating is based on a scale of 1 through 5, with 5 being the highest level, and 1 being the minimum certifiable level.
The general metrics used by CRL in assessments are based on the Trustworthy Repositories Audit and Certification checklist, and on other metrics developed by CRL through its analyses of digital repositories. TRAC was developed by a joint task force created by the Research Libraries Group (RLG) and the National Archives and Records Administration between 2003 and 2005, to provide criteria to be used in identifying digital repositories capable of reliably storing, migrating, and providing access to digital collections. TRAC represents best current practice and thinking about the organizational and technical infrastructure required to be considered trustworthy and worthy of certification.
It should be noted that CRL certification of HathiTrust applies specifically to the repository’s ability to preserve and manage digital files of books digitized by the University of Michigan, Google, and the Internet Archive, as well as the digital files generated from books digitized by other providers that conform to comparable standards. CRL did not assess HathiTrust procedures and processes for acquiring and managing more complex digital objects such as audio, video, archived websites, or other types of content.
CRL assessed HathiTrust on each of the three categories of criteria specified in TRAC and has assigned a level of certification for each. The numeric rating (below) is based on a scale of 1 through 5, with 5 being the highest level, and 1 being the minimum certifiable level:
TRAC Category |
HathiTrust Score |
Organizational Infrastructure |
2 |
Digital Object Management |
3 |
Technologies, Technical Infrastructure, Security |
4 |
On the basis of the audit, CRL identified areas in which HathiTrust will need to improve processes or provide greater disclosure of information about those processes. These areas correspond to specific TRAC criteria or to features of the repository that members of the Certification Advisory Panel believe are important to the CRL community.
The specific areas identified for improvement are:
1. Definition of Rights and Ownership of HathiTrust Enterprise Assets (TRAC criteria A3.3, A3.7, and A4.3)
“Repository commits to transparency and accountability in all actions supporting the operation and management of the repository, especially those that affect the preservation of digital content over time.” (TRAC criterion A3.7)
HT is not a separate legal entity, and so cannot legally own the capital equipment, content, metadata, and other assets acquired or generated by the partnership. HathiTrust therefore must clearly define and establish where ownership and control of the repository’s major assets and other property essential to the continued access and preservation of repository content reside.
The ownership of the individual objects within the repository is clearly specified in agreements with the depositors. However, it would be appropriate to clarify the ownership of the aggregate HathiTrust database; the rights and ownership of new content (such as derivative files, metadata, etc.); and rights to the non-content assets of the operation, including software and system tools to be developed in the future.
2. Succession or Disposition Plan for HathiTrust Assets (TRAC A1.2)
TRAC A1.2 requires “an appropriate, formal succession plan, contingency plans, and/or escrow arrangements in place in case the repository ceases to operate or the governing or funding institution substantially changes its scope.”
Most 501c3 organizations provide in their bylaws for the disposition of property in the event of the organization’s dissolution. It would be appropriate for HathiTrust to explicitly address the transferability or disposition of its assets in the event of discontinuation of repository operations. Under present circumstances the University of Michigan, as parent organization of the repository, would seem to be the owner of those assets and arbiter of such decisions, absent agreements to the contrary.
3. Clarify and Strengthen the Quality Assurance and Print Archiving Components of the HathiTrust Program (TRAC A3.8 and B1.1, B1.7, B1.8, B2.4 )
TRAC criterion A3.8 requires that the repository “commits to defining, collecting, tracking, and providing, on demand, its information integrity measurements.” Criterion B1.1 requires that the “Repository identifies properties it will preserve for digital objects," and B1.7, B1.8 and B2.4 address the recording of the information about rejected SIPs.
One explicit goal described in the HathiTrust mission statement is to "coordinate shared storage strategies among libraries, thus reducing long-term capital and operating costs of libraries associated with the storage and care of print collections." The repository should put in place and clarify its plan for achieving that goal, as the cost reduction described is a relevant metric of the value of HathiTrust and its services. The new HathiTrust pricing model, to be introduced in 2013, will directly correlate the overlap between the repository corpus and the print holdings of the participating libraries. This will increase pressure for participating libraries to divest of print volumes available through the repository.
The quality assurance measures for HathiTrust digital content do not yet support this goal. Inspection criteria and standards are in place for materials ingested from the Google Books project, but it is not clear what results when an object fails such inspection. It is also unclear what level of quality review materials digitized by partner institutions or those made available through entities such as the Internet Archive are subjected to. This will be material to libraries’ decisions on whether to retain or dispose of corresponding copies.
Currently, and despite significant efforts to identify and correct systemic problems in digitization, HathiTrust only attests to the integrity of the transferred file, and not to the completeness of the original digitization effort. This may impact institutions’ workflow for print archiving and divestiture.
The TRAC document notes that “. . . attaining trusted status is not a one-time accomplishment—achieved and forgotten. To retain trusted status, a repository will need to undertake a regular cycle of audit and/or certification.” To that end CRL expects that in addition to acting to remedy the issues identified above HathiTrust will also make certain disclosures on a regular basis. CRL and HathiTrust have agreed that ongoing certification is contingent upon HathiTrust making the following disclosures every two years:
Certification is also contingent upon HathiTrust’s agreement to a periodic, systematic sampling and inspection of the repository’s archived content by CRL, or by a third party designated by CRL, using either a manual or automated process as determined by mutual agreement between CRL and HathiTrust.
Links
[1] https://www.crl.edu/sites/default/files/reports/CRL%20HathiTrust%202011.pdf
[2] http://www.HathiTrust.org
[3] https://www.crl.edu/facets/archiving-and-preservation
[4] https://www.crl.edu/reports
[5] https://www.crl.edu/sites/default/files/d6/attachments/pages/trac_0.pdf
[6] http://public.ccsds.org/publications/archive/650x0m2.pdf