A New Strategic Framework for North American Research Libraries

Image of U.S. Capitol dome from Wikimedia.

In a summary of the two days’ discussions, CRL president Bernard Reilly listed five factors presenters and attendees identified that loom as particularly serious threats to the long-term integrity and accessibility of e-government information. These threats mirrored some of the concerns raised by the panel of historians convened by the American Historical Association Research Division in advance of the Leviathan Forum. (See the report “Governments and the Digital Record: the Historian’s Perspective.”)

  1. Scale/“The Leviathan”: In the age of Big Data, the sheer number of records, documents, and datasets being produced by governments is immense and is growing exponentially. Moreover, the variety and complexity of the digital information being produced—as the applications and platforms adopted by government agencies continue to multiply and evolve—is increasing as well. The result is large and growing backlogs of unprocessed records in NARA and sporadic inclusion of agency documents in FDsys.
  2. “Known Unknowns”: The universe of government information is growing and changing so rapidly that the current scope of that universe is essentially immeasurable. This, combined with the tendency of governments to withhold from public view records and communications they consider sensitive, enormously complicates the task of preservation.
  3. Asymmetry: A large and growing disparity exists between the amount of information produced by governments, and the resources allocated by the public sector to preserve that information. Even at the national level in the U.S. and Canada, federal agencies historically tasked with archiving are ill-equipped to meet the new challenges. The problem is not simply one of mismatched resources, but of limits on the authority those agencies have over government publishing and records management.
  4. “The Cloud”: In the government information ecosystem, organizations that bring robust digital technologies and capabilities to the table now play a larger role than in the past. Such organizations include:
    • Cloud service providers and other suppliers of technology platforms used by government agencies to create and manage records and publish information
    • “Fourth Estate” organizations that aggregate, analyze, and interpret government records and data, and make them available to citizens and researchers
    • Private-sector, market-oriented aggregators and other third parties that locate, organize, reformat, and distribute public sector data and documents to users in the academic, government and business worlds.
    These organizations are an important part of the equation, as the channels through which much government information reaches users today. The content they serve is often inseparable from the tools they produce.
  5. “The Fog” (Non-Transparency): Ironically, in an era of open government data, too little information is available about how government records and information are created and maintained. In particular, we need more data about:
    • Technology: the myriad sophisticated systems and software that are now part of the critical infrastructure of governments. These include NARA’s Electronic Records Archive, acknowledged to be problematic, as well as the kinds of email systems that resulted in the wholesale loss of White House and IRS messages. If such systems remain “black boxes,” information vital to scholars about the production, provenance, and alteration of content will be lost or rendered unusable.
    • Finance: The resources invested by the commercial aggregators and distributors of government content, and the returns such entities realize on those investments, are often undisclosed. Lack of such information thwarts library due diligence in evaluating resources for purchase and subscription.
    • Politics: The actions of governments themselves often have a bearing on the integrity and accessibility of their records. All governments, to a greater or lesser degree, tend to resist public scrutiny of their activities. As Mary Case observed, the behavior of the agencies—and even that of publicly funded archives—can at times be in conflict with the mission of the research library and the interests of researchers.

A “Leviathan” Agenda

These are big challenges. To address them, research libraries will require a new “playbook.” Existing mechanisms, such as the U.S. Federal Depository Library Program and the Government of Canada’s Depository Services Program, were designed to overcome geographic obstacles to broad public access to documents that were tangible and available in limited quantities. The realities of today’s “paperless government,” global digital networks, and ubiquitous information pose different challenges and call for new approaches.

The broad strategies outlined below, based on the Leviathan discussions, are essential elements of a realistic action agenda for U.S. and Canadian research libraries. They constitute a new “playbook” for effective stewardship of government information on behalf of the scholars our libraries serve. As such, they will henceforth form the basis for CRL’s own planning and priorities.

  1. Triage: Given the scarcity of resources for preservation today, libraries must focus their efforts on what is known to be at risk and what is not likely to be adequately preserved by other actors, public sector or private. Because HathiTrust, ASERL, and other organizations have declared their intention to comprehensively archive all U.S. federal government documents, CRL will therefore focus its efforts on securing access for its community to other digital resources, i.e., materials in established CRL areas of strength, such as records relevant to international affairs, U.S. and its allies’ diplomatic history, and American internal and national security; records and documents produced by governments in conflict zones, unstable areas, and in regions of U.S. and Canadian national strategic interest; and materials produced by corrupt and/or non-transparent governments abroad, which are likely to be lost if not independently harvested and archived.
  2. Drill Down: Determining what is at risk requires a detailed understanding of how the digital government information lifecycle and supply chain work, including the workings of critical archiving infrastructure like NARA’s Electronic Records Archive and GPO’s FDsys. Too little is known about the cloud services upon which agencies depend, and the proprietary systems and platforms agencies use to distribute information. In the future, “application literacy” will be an essential part of the skill set of both librarians and researchers. CRL will undertake, as resources allow, the analysis, mapping, and documentation of the technologies and platforms used by NARA and Library and Archives Canada to preserve born-digital U.S. and Canadian federal government records. CRL will also seek the support of its constituents and other libraries in the Federal Depository Library Program to conduct an independent evaluation of the FDsys repository system, providing a gap analysis and baseline data that can inform planning for the systematic archiving of born-digital U.S. government agency publications.
  3. Differentiate: Researchers in different fields require different things of government information: integrity and authenticity mean one thing to an economist and another to a historian. To serve constituencies using new analytical tools tailored to their respective disciplines, libraries will have to abandon the “one size fits all” regimes used to accommodate those constituencies in the print era. CRL will focus its ongoing analysis of research methodologies and practices on the users of electronic records and data in the focus areas identified above, and will publicize those methodologies through its Global Resources Forum.
  4. Collectivize: Research libraries must actively engage as a community with key “suppliers” of government information, including NARA, GPO, and Library and Archives Canada; and must unite to gain leverage in dealings with key aggregators of government information, such as ProQuest, Bloomberg, and others. Given the scale of the resources behind those suppliers, CRL will work with NERL and other appropriate consortia to ensure that North American research libraries “speak with one voice” in negotiating not only affordable access to government information, but also provisions for long-term accessibility and integrity, quality of metadata, interoperability, and tools for analysis and mining of text and data.
  5. Act Up: To obtain the resources and standing necessary to play a meaningful role in a realm as large as government information, research libraries will have to forge new partnerships. Organizations now abound that share with libraries an interest in public access to government information and documentation. The National Security Archive, for example, uses FOIA requests, litigation, and other means to compel the U.S. government to declassify and disclose records and information. Though not a library, the NSA has succeeded in making a wealth of critical historical evidence available to scholars and citizens alike. Under a new effort led by UNESCO, the ICA, IFLA, LIBER, and other partners are working to encourage national governments and the technology industry to promote greater persistence and disclosure of government information. CRL support for the efforts of those open government data initiatives could well accelerate declassification and disclosure of government materials important to scholars. As a first step, as Ingrid Parent recommended, CRL will draft an “advocacy statement,” based on the Leviathan strategic framework, to be brought to IFLA this summer in Lyon.

Stewardship will, under certain circumstances, continue to involve libraries and archives taking—as they did in the paper era—wholesale custody of materials produced by governments. But this depository role has become less relevant. If we define preservation as protecting and empowering access for our constituents, then archiving becomes only one of several appropriate strategies. More often, however, libraries will have to actively monitor and scrutinize how government records and information are produced, managed, and distributed, whether by the government agencies themselves or by third parties, and identify the faults and deficiencies of those operations.

It has been at least a decade since paper documents ceased being the primary form in which most researchers access government data. The Leviathan presentations and discussions suggest that it is time for libraries to adopt a new playbook to support that access.

Tweets from #CRL_Leviathan

(April 24–25, 2014)

Sustainable Information? Ingrid Parent discussing PERSIST, an int’l effort: http://bit.ly/1lcRTuL #CRL_Leviathan
—jeff kosokoff @kozmoboxman

Mary Case: we’ll always have tension btwn scholars’ desire for all info and resource and practical constraints/costs #CRL_Leviathan
—Eileen Fenton @egfenton

In sprit of finding out what’s out there we’re creating a registry of gov doc digitization projects http://bit.ly/gov-docs #CRL_Leviathan
—Kitt McGoveran @kittmcg

“@CRL_Global: James A. Jacobs: digital preservation should have an access orientation #CRL_Leviathan http://www.crl.edu/leviathan ”. 100% agree !!
—Paul N. Wagner @pnwagner

JJ: what we know (3): scope of born-digital government information now greatly out paces what is being done to preserve it. #CRL_ Leviathan
—Freegovinfo @freegovinfo

what we know (2): most borndigital government information is not being preserved by libraries. #CRL_Leviathan
—Freegovinfo @freegovinfo

Only 2% of what the US government creates gets saved at the National Archives - we need to get creative to improve capacity #CRL_Leviathan
—Kitt McGoveran @kittmcg

Look at http://3stages.org/crl and hold mting to discuss UO’s progress on Data for Local Communities http://library.uoregon.edu/dc/dlc/ #CRL_ Leviathan
—Mark Watson @mrwatson44

#crl_leviathan Today in WaPo: Magistrate’s revolt in releasing digital data http://wapo.st/1ifht1V - retain judges’ emails on this!
— Rachel Brekhus @brekhusr

Plz remember #CRL_Leviathan attendees, #fdlp first & foremost a collaborative network of libraries & librarians. That’s what’s needed now
—Freegovinfo @freegovinfo

“@poptheapp: The persistence of memories of the persistence of memory.” https://gopop.co/36724” referenced in my #CRL_Leviathan talk
— John S. Bracken @jsb

This is the view from the other window at the conference site. #stunning #CRL_Leviathan pic.twitter.com/cpORBoiKgq
—paulwester @paulwester

John Bracken: how can we make the internet better? Knight News Challenge @ knightfdn @JournalismLib http://www.newschallenge. org #CRL_Leviathan
—Columbia CHRDR @HRDocumentation