HathiTrust Audit Report 2011

Report Details

Hathitrust: Detailed Audit Findings

 

Findings Related to TRAC

 

CRL assessed HathiTrust on each of the three categories of criteria specified in TRAC and has assigned a level of certification for each. The numeric rating (below) is based on a scale of 1 through 5, with 5 being the highest level, and 1 being the minimum certifiable level:

 

TRAC Category

    HathiTrust Score

Organizational Infrastructure

2

Digital Object Management

3

Technologies, Technical Infrastructure, Security

4

 

On the basis of the audit, CRL identified areas in which HathiTrust will need to improve processes or provide greater disclosure of information about those processes.  These areas correspond to specific TRAC criteria or to features of the repository that members of the Certification Advisory Panel believe are important to the CRL community. 

 

The specific areas identified for improvement are: 

1. Definition of Rights and Ownership of HathiTrust Enterprise Assets (TRAC criteria A3.3, A3.7, and A4.3)

“Repository commits to transparency and accountability in all actions supporting the operation and management of the repository, especially those that affect the preservation of digital content over time.” (TRAC criterion A3.7)

HT is not a separate legal entity, and so cannot legally own the capital equipment, content, metadata, and other assets acquired or generated by the partnership. HathiTrust therefore must clearly define and establish where ownership and control of the repository’s major assets and other property essential to the continued access and preservation of repository content reside.

The ownership of the individual objects within the repository is clearly specified in agreements with the depositors. However, it would be appropriate to clarify the ownership of the aggregate HathiTrust database; the rights and ownership of new content (such as derivative files, metadata, etc.); and rights to the non-content assets of the operation, including software and system tools to be developed in the future. 

2. Succession or Disposition Plan for HathiTrust Assets (TRAC A1.2)

TRAC A1.2 requires “an appropriate, formal succession plan, contingency plans, and/or escrow arrangements in place in case the repository ceases to operate or the governing or funding institution substantially changes its scope.”

Most 501c3 organizations provide in their bylaws for the disposition of property in the event of the organization’s dissolution. It would be appropriate for HathiTrust to explicitly address the transferability or disposition of its assets in the event of discontinuation of repository operations. Under present circumstances the University of Michigan, as parent organization of the repository, would seem to be the owner of those assets and arbiter of such decisions, absent agreements to the contrary. 

3.   Clarify and Strengthen the Quality Assurance and Print Archiving Components of the HathiTrust Program (TRAC A3.8 and B1.1, B1.7, B1.8, B2.4 )

TRAC criterion A3.8 requires that the repository “commits to defining, collecting, tracking, and providing, on demand, its information integrity measurements.” Criterion B1.1 requires that the “Repository identifies properties it will preserve for digital objects," and B1.7, B1.8 and B2.4 address the recording of the information about rejected SIPs.

One explicit goal described in the HathiTrust mission statement is to "coordinate shared storage strategies among libraries, thus reducing long-term capital and operating costs of libraries associated with the storage and care of print collections." The repository should put in place and clarify its plan for achieving that goal, as the cost reduction described is a relevant metric of the value of HathiTrust and its services. The new HathiTrust pricing model, to be introduced in 2013, will directly correlate the overlap between the repository corpus and the print holdings of the participating libraries. This will increase pressure for participating libraries to divest of print volumes available through the repository. 

The quality assurance measures for HathiTrust digital content do not yet support this goal. Inspection criteria and standards are in place for materials ingested from the Google Books project, but it is not clear what results when an object fails such inspection. It is also unclear what level of quality review materials digitized by partner institutions or those made available through entities such as the Internet Archive are subjected to. This will be material to libraries’ decisions on whether to retain or dispose of corresponding copies.  

Currently, and despite significant efforts to identify and correct systemic problems in digitization, HathiTrust only attests to the integrity of the transferred file, and not to the completeness of the original digitization effort. This may impact institutions’ workflow for print archiving and divestiture.