Portico Audit Report 2010
Portico: Detailed Audit Findings
In the course of its audit, CRL identified areas in which Portico should improve. These includes specific line items within TRAC and also areas that have been identified as important to the CRL membership. The latter areas of interest include scope of the archive, functionality and services, and future costs and risks.
Findings Related to TRAC
There are three primary areas to be assessed within TRAC. CRL has assessed Portico in each of these areas and assigned a level of certification. The numeric rating is based on a scale of 1 through 5, with 5 being the highest level, and 1 being the minimum certifiable level:
Digital Object Management
Technologies, Technical Infrastructure, Security
Within TRAC, there are 84 individual criteria. CRL has concerns about Portico’s status on 12 of the 84 criteria. Below we describe each of those criteria and CRL concerns.
Criteria - A1.2 Repository has an appropriate, formal succession plan, contingency plans, and/or escrow arrangements in place in case the repository ceases to operate or the governing or funding institution substantially changes its scope.
At present there is no designated Portico successor organization. Portico should identify such an organization. Then Portico should put in place and disclose a plan for the disposition of Portico archival content, technology, and other assets in the event of discontinuation of the program by the parent organization. This is particularly important because the ongoing business viability of Portico as a service is not yet assured, judging from financial information disclosed to date.
Criteria - A2.2 Repository has the appropriate number of staff to support all functions and services.
At this time, Portico’s documentation is not sufficient to allow CRL to verify compliance with this metric. Portico should implement a process to keep job descriptions up to date and better document how the roles and responsibilities of people and positions change over time.
In addition, the archive's procedures are almost entirely designed around, and end with, ingest and should be modified to include specific responsibilities for ongoing testing and maintenance of archived content. The archive should, for example, add responsibilities for testing and responding to problems with already ingested content to the Portico Roles and Responsibilities document, the Portico Automated Workflow, the E-Journal Workflow 1.9 documentation, and related policies and procedures.
Criteria - A3.2 Repository has procedures and policies in place, and mechanisms for their review, update, and development as the repository grows and as technology and community practice evolve.
Portico policy infrastructure has improved considerably since the test audit in 2006, but some of these policies still suffer from internal contradictions and inconsistencies, specifically in the area of roles & responsibilities and job descriptions.
As rate of growth and size of the archive increase to as much as a terabyte per day, the archive may have to revise its current policies to accommodate the increased time needed to refresh and migrate content.
Criteria - A3.6 Repository has a documented history of the changes to its operations, procedures, software, and hardware that, where appropriate, is linked to relevant preservation strategies and describes potential effects on preserving digital content.
Portico has much technical information about the systems and environment encoded in the metadata of the individual content items preserved in the Archive. However, there is no separate, complete documentation tracking the changes to the hardware and software of the ingest and archive systems over time. This documentation is an important tool in assessing any repository and should be created.
Criteria – A4.1 Repository has short- and long-term business-planning processes in place to sustain the repository over time.
While it is apparent that Portico has business-planning processes in place, it was not possible for CRL to assess those processes because of the unavailability of documentation of same.
Criteria - B1.6 Repository provides producer/depositor with appropriate responses at predefined points during the ingest processes.
Portico should put in place procedures and mechanisms to routinely notify and provide other appropriate responses to licensors/publishers as materials are ingested into the archive and develop workflow documentation for this process.
Criteria - B2.10 Repository has a documented process for testing understandability of the information content and bringing the information content up to the agreed level of understandability.
Portico needs to continue to identify what its community believes is necessary for “understandability” or usability of the preserved content. Portico should develop a process to support ongoing research into the needs of its community and determine what the Portico stakeholders think is an understandable e-journal, e-journal article, e-book, etc. As those needs evolve, Portico should develop test scenarios to evaluate how well the archive meets those needs.
This will be particularly important as Portico archives genres of content other than e-journals. Portico is actively exploring the requirements and potential funding and business models to support archiving of genres such as e-books, digitized newspapers, and other types of databases. (At present Portico does not plan to target the preservation of entire genres of video or other audiovisual formats, though files in these formats are preserved within the archive). As these genres may be less compatible with existing Portico workflows and technologies and may impact workflow and data management techniques, meeting “understandability” requirements for these could affect Portico’s costs and pricing. This is an area to which the CRL audit team will pay particular attention over the coming years.
Criteria - B2.12 Repository provides an independent mechanism for audit of the integrity of the repository collection/content.
The Portico audit interface contains a subset of the content of the entire Archive and at this time, it is not possible to independently “look” into the Portico archive and determine if the requested digital object is complete. Portico is working on a new audit interface and system that may address these concerns.
Criteria - C1.10 Repository has a process to react to the availability of new software security updates based on a risk-benefit assessment.
Portico needs to establish and provide to CRL a risk register that identifies what software and hardware patches for the Portico systems are available and what the risk assessment and plan is for each.
Criteria - C2.2 Repository has software technologies appropriate to the services it provides to its designated community and has procedures in place to receive and monitor notifications, and evaluate when software technology changes are needed.
Portico’s ability to disseminate content to the users in the event of a major “trigger event” (for example, where all content from a large publisher with a large user base must be made available) is limited. This relates to Portico’s status as a dark archive. Aside from the “audit” interface provided to enable subscribers to verify the presence of content in the archive, there is a delivery interface that is rudimentary at present. Portico states that it would rely upon the existing JSTOR infrastructure for support of the Portico Web site, which would deliver Portico content after such a major “trigger event.” However, it is not clear how quickly this delivery infrastructure could scale to meet user needs in the event of a major trigger event.
Criteria - C3.1 Repository maintains a systematic analysis of such factors as data, systems, personnel, physical plant, and security needs.
The security of Portico systems was not tested, although Portico has already conducted a penetration test. While we have no reason to believe that the Portico systems are at risk, the Certification Advisory Panel believes that Portico should undergo a security audit.
Criteria - C3.3 Repository staff have delineated roles, responsibilities, and authorizations related to implementing changes within the system.
Portico needs to maintain more accurate and up-to-date documentation of the roles and responsibilities of key repository personnel, particularly the roles and responsibilities of those involved in technology watch activities.
Findings Related to Additional CRL Concerns
If Portico is to continue to be recognized as one of the CRL community’s permanent archives of scholarly content, then Portico should address the following concerns of the CRL Audit Advisory Panel.
Scope of the archive
If Portico is to provide CRL libraries a comprehensive, long-term preservation archive of e-journals, then it is still short of archiving a “critical mass” of journal content. In 2006 Portico had 13 publishers, representing 3,557 electronic journal titles. As of October 2009, Portico had 83 committed publishers representing 10,461 titles (although as of the same date content from only 7,682 e-Journal titles were actually preserved within Portico). Even if all electronic issues of those titles were included in Portico, this is still, however, only 50% of the ~20,900 journal titles in CrossRef. CRL will work with Portico to determine the percentage of the journal titles in CrossRef that would constitute a critical mass of journal content.
It should be noted here that there is no way to independently verify and monitor the presence and integrity of content in a repository like Portico comprehensively or on a meaningful scale. Such verification and monitoring is a challenge inherent in “dark” archives, which are unable to be accessed for such purposes. Portico provides an audit archive interface, designed to enable users to “view” the content archived. However, the interface accesses not the actual archived information, but rather information that is a replica of the archive. Portico’s process for generating the replica information appears to be sound, based on a demonstration of that process performed during the site visit. Yet given the amount of content in the repository, such demonstrations are not a practical means of verifying and monitoring the presence of content on a comprehensive basis. Therefore, the auditing and assessment community will need to devise a satisfactory means of independently monitoring the archive’s content. This will require Portico cooperation in further exposing its content, or its metadata, to scrutiny.
Functionality and services provided by the repository
The holdings comparison tool has limitations and should be improved. It has been reported that it is difficult to compare Portico holdings with those of a given participating library. This difficulty renders the scope of content preserved by the repository unclear and undermines the ability of a library to fully determine the value of the Portico service. The value and usability of the comparison reports would be enhanced if Portico provided a glossary of definitions for the different fields in the spreadsheet and a summary of the overall findings (i.e. the extent of gap/overlap).
Moreover, it is not clear that the minimal delivery standards Portico has set for itself fully conform to the expectations of all of its designated user communities. One area of concern is the lag time between a “trigger event” and delivery of content by Portico. The lag time of up to 60 days, specified in Portico agreements with publishers and libraries, is less likely to be acceptable in some fields, like medicine, where a hiatus of this duration would have a greater impact on users than a comparable loss of access to a journal in the humanities. As reasonable over time, the archive should tailor its agreements with publishers to better accommodate use cases in all fields.
Future costs and risks
Portico is a part of ITHAKA (www.ithaka.org), an independent not-for-profit organization. Portico and JSTOR are both not-for-profit services that are part of ITHAKA. Portico is fiscally dependent upon ITHAKA, and thus its relationship with JSTOR presents both a risk and an opportunity. This affiliation with JSTOR might be a deterrent to some publishers’ willingness to deposit journal content in the Portico Archive. On the other hand, JSTOR delivery capabilities could offer Portico a robust network and server environment through which post-trigger access to the archived journals content might be provided on a large scale.