From LexisNexis to WikiLeaks: The New Marketplace for Government Information

Screen shot from State Stats, courtesy of Stephen Stesney, SAGE Publications.

CRL’s 2013 pre-conference session at the Charleston Conference brought together aggregators and republishers of government information with collection development and government information specialists. The participants discussed:

  • How the packaging and marketing of government information has changed over the past five years;
  • What value commercial publishers add to raw government information and data in the form of analysis and tools, and at what cost;
  • How new media and methodologies drive novel approaches to delivery of government content; and
  • What challenges the new supply chain for government information presents for researchers.


Bill Sudduth, Head of Government Information and Maps at the Thomas Cooper Library of the University of South Carolina, framed the issues from a government information librarian’s perspective. A significant change for academic libraries has been a shift toward more focused, user-centered documents collections, which is the antithesis of the regional depository library model. Collaborative consortia-based efforts—including ASERL’s Cooperative Federal Depository Program (CFDP) and the TRAIL Project for technical reports—have supported this move, improving access through digitization and shared collections. At the same time, significant changes have affected government publishing itself, resulting in gains such as the reorganized and expanded FDSys database (formerly GPO Access). But the gains have been tempered by losses including the Census Bureau’s cessation of the Statistical Abstract of the U.S. in 2011, and the more recent announcement that NTIS will cease distributing translations from the World News Connection service at the end of 2013. Unfortunately these losses are happening at a time of expanding research interest in “big data.”

Sudduth argued the desirability of a rational and strategic national information policy rather than decisions based on short-term budget issues. He asked: In the digital age, how can a “complete” collection of government information (at the federal level) be defined, and what role can vendors play in assuring access? Furthermore, with the expanding role vendors are taking in publishing governmentsupplied information in online databases, what provisions are they making to ensure that what they publish is persistent, and maintains the same authority and authenticity as content supplied directly from the GPO, NARA, and government agencies? Will vendors view revised Title 44 legislative provisions for public printing as an opportunity or a threat?

Catherine Johnson, Publisher of Legislative Services at ProQuest, pointed out that the private sector has been involved in meeting government information needs since early publishing efforts in the 1970s; and that this activity has definitely accelerated over the past few years with the expansion of digitization. The private sector can fill an important role in providing deeper online collections and richer methods of access, as well as supporting innovative uses of statistical data. Researchers with varying interests and backgrounds look for different types of information in legislative and executive documents, from examining the legislative process itself, to understanding the social dynamics behind various initiatives. In recent years, libraries have presented vendors with expanding functional requirements for publishing government information online, including: advanced search and analytical tools; access to content for data mining; and depth of collections to ensure that full sub-sets of data are available to meet the needs of specific research projects. To ensure authenticity, ProQuest provides a GPO digitally signed PDF, but then creates “value-added” access versions of the document with a fully OCR’d copy accompanied by indexing and inserted citations. Moving forward, Johnson sees a multiplicity of roles in providing sustained access to government information with potential partnerships and shared interests between libraries, not-for-profit online innovators, and for-profit vendors.

Stephen Stesney, Managing Editor of Online Publishing for SAGE Publications, cited the “flood of data” being released at all levels from multiple sources, including nonprofits and private companies as well as government sources. On the one hand, researchers are simply interested in getting more data released by the government, since complex data analysis tools are now cheaper and easier to use. On the other hand, consumers of published government information also have elevated expectations for these digital collections, going beyond basic tools and functionality. Aggregating publicly available data is no longer sufficient; products need to have unique data and create additional value. At the same time, more data does not necessarily equal good data. It is challenging for publishers to hunt, extract, and clean data from multiple government sources. SAGE’s plans for meeting social science research needs will focus on incorporating unique data, which is “contextualized with public data”. It plans to work with specialist “data editors” to find and analyze overlooked data sources, as well as commissioning some new data sets. SAGE anticipates that the future will bring “more data, more data, more data,” including increased use in classrooms. For publishers, this brings an opportunity to create data-learning environments for students. It also represents a challenge to ensure authority of data, through detailed metadata that consistently describes and links back to the original data sources.

Jeffrey Cross, Academic Sales Manager for Statista, echoed Stesney’s emphasis on the growing demand for data from multiple sources. In a “multi-polar data world,” not only is more public information openly accessible, but multiple sources for the same data exist. Like SAGE, Statista seeks to integrate data from industry and non-governmental organizations with government-sourced information. Cross also emphasized the importance of trust and transparency in documenting sources, to guarantee the authority of data. Nothing can replace human editorial oversight. And he emphasized that vendors have a responsibility to work with each other, their sources, and libraries on preservation as well as access.

Government Printing Office. Typesetting. Photograph from Harris & Ewing collection, Library of Congress Prints and Photographs Division, Washington, D.C.

Robert Lee, Director of Online Publishing and Strategic Relationships for East View Information Services, described the particular challenges of sourcing and distributing non-U.S. governmental information. To start with, how does one define what constitutes government data, which varies significantly across different world areas (for example: Russia, China, and the Middle East) and over time? The extent of availability or censorship can be very revealing of a government’s position on access, depending on what is flagged for internal consumption, obfuscated, censored, or simply not approved for release. The U.S. saw the recent example of cutting off the distribution of the translations in the World News Connection. Even in the existing WNC content, the basis for selecting materials to translate and disseminate is not well understood.

Additionally, Lee noted, differences exist between traditional definitions of government documents and the ways in which new media can facilitate access today, allowing more immediacy but also potentially more instability of authentic and persistent sources. Funding for distribution of information from agencies is sometimes limited, forcing selectivity. Third parties, such as non-governmental agencies, may serve as a hedge to problems of consistency in government dissemination. In any case, the potential benefits to research, of improved aggregation, normalization, and preservation of data from various government sources leads one to hope for more private/ public synergies in the future.

Angela Carreño, Head of Collection Development for the Division of Libraries at New York University, responded to the panel’s points from the perspective of a collections manager. She is called upon to respond to various research interests as well as balance spending requests. First and foremost, it is clear that the role of vendors will become increasingly important to help aggregate as well as to preserve data.

Vendors will be especially important in identifying and providing access to information from governments in other parts of the world, as demonstrated by Robert Lee. A systematic process for capturing, preserving, and disseminating foreign borndigital government information is currently lacking. Could a routinized collaborative library service make it possible for specialists to identify and also acquire open access government documents from non-U.S. jurisdictions (such as the Justice Verma Committee report issued following the rape homicide case in New Delhi)? She cited as another example a professor seeking authoritative data on 2006 election results in the Dominican Republic. This information was previously published in the Gaceta Oficial. Now there are various sources including a website hosted by the Junta Central Electoral, but the spreadsheet results found there are unwieldy and inconsistent with other sources.

Vendors will also play an essential role in supplying the growing research demand for “big data.” Emerging at NYU is the field of “urban informatics.” There the Center for Urban Studies and Progress (CUSP) plans to bypass library acquisitions efforts, to collect data directly from city and state agencies.