The findings of a CRL analysis of print archives data call into question some of our basic assumptions on shared print. Earlier this year Amy Wood, head of CRL Technical Services, and Marie Waltz, Special Projects Manager, using data in the PAPR database, analyzed the holdings of three major validated JSTOR print archives: CRL's own JSTOR print archive and the archives maintained by the University of California and Harvard University. CRL began building its archive in 2001; the California and Harvard archives were established shortly thereafter, in cooperation with JSTOR.
Our analysis found that not one of the three archives holds a complete set of JSTOR titles. The two "dim" archives (Harvard's is a "dark" archive, i.e., off-limits for use) are far from complete: California holds only 51%, and CRL 57%. Combined, California and CRL hold 63%. Harvard's archive, assembled from print copies digitized by JSTOR, is the most complete but at 88% still lags behind the ever-growing corpus of JSTOR material.
Moreover, it is not clear from the existing data how many of the titles are archived in full runs. The analysis only accounts for titles represented in the archives by one or more issues. Per volume, CRL's collection contains approximately 60% of the estimated 134,136 digitized volumes in JSTOR. (CRL's “invitational” archive, receives titles from deselecting institutions, and is therefore dependent on the donations of willing depositors.) And the gap is widening, as JSTOR adds “rolling wall” volumes and new titles annually).
Considering the time and resources the three organizations have devoted to archiving JSTOR print, it is surprising that after more than a decade of building the archives we are still so short of comprehensive. This is also sobering, because the 4,200-title JSTOR corpus represents a miniscule fraction of the total number of serial titles held in print by major North American research libraries. By our estimate, the critical corpus of humanities and social science serials is close to 500,000 titles.
This would be more alarming if JSTOR titles were not still so common in libraries, or so widely archived by the shared print initiatives. Yet the findings are worth considering by libraries divesting of serial holdings based on an assumption of comprehensive archiving and accurate holdings data. The findings should also be part of our calculus on the scalability of our shared print efforts. Meanwhile CRL's Collections and Services Policy Committee is pondering next steps. It seems evident that instead of continuing to try and populate three separate archives, the priority should be acquiring the missing titles and working with California, Harvard and JSTOR to identify and fill the common gaps.
CRL's Agenda for Shared Print, 2017-2026 cited as a priority “identifying and assembling secure and well-curated serial collections and working to improve their quality, integrity and comprehensiveness.” One complete JSTOR archive would serve our purposes better than three incomplete ones.
Bernard F. Reilly
Center for Research Libraries