Archiving Digital News: Planning at the Library of Congress

The decline of the newspaper industry combined with the ascent of digital media for news reporting and distribution means that a significant portion of the journalistic record is now at risk.

Recently, the Library of Congress (LC) National Digital Information Infrastructure and Preservation Program (NDIIPP) held a workshop to explore possible strategies for collecting and preserving digital news on a national basis. For purposes of discussion, LC defined digital news to include, at minimum, "digital newspaper Web sites, television and radio broadcasts distributed via the Internet, blogs, pod casts, digital photographs, and videos that document current events and cultural trends."

The workshop, held September 2 and 3, 2009, brought together about 30 invited specialists in the field: broadcasters, producers, distributors, and archivists, as well as researchers who depend upon digital news. The discussions focused on a set of questions created by LC prior to the meeting:

  • What is the risk that we will fail to collect the historical record?
  • What news topics and sources do we seek to preserve?
  • What are the respective roles of content owners and for public archives in archiving news content?
  • What shall we say about “local” as compared to “national” content and organizations?
  • What are some strategies and possible models for addressing the issues?

Attendees heard presentations on existing LC programs that preserve television, radio, and newspapers. Presentations also featured a variety of archiving programs at individual universities, state and local institutions, and media organizations. Some points of consensus emerged from the discussions:

  1. A national effort should be mounted to preserve the full range of types of digital news. The full spectrum would encompass news Web sites; conventional cable and satellite television and radio broadcasts produced by large media organizations such as The New York Times, Washington Post, CBS, CNN, WGBH, and The Associated Press; and news reported through podcasts, blogs, Twitter feeds, and other forms of social media. Some expressed the view that the present precarious economic state of the news industry made preserving the digital archives of traditional media organizations a particularly urgent priority.
  2. The effort should address news content that is actually disseminated or "published" as well as the "raw materials" of news production. "Published" output includes broadcasts, Web sites, blog posts, newspapers, and so forth. The "raw materials" include files generated by the news media, such as unaired video and audio, assignment photography, unpublished text and data, etc.
  3. The effort needs to engage members of many stakeholder groups. Producers, aggregators, and distributors of news, particularly the large media organizations, could contribute to the capabilities and assets that would supplement the relatively modest resources of archives and libraries.
  4. The preservation effort must serve the needs and interests of clearly defined audiences and be tailored to the practices, means, and methodologies of users, while not infringing upon the intellectual property rights of the producers or the business interests of the media organizations. The target audiences identified included not only scholars in academia, but local and family historians, public policy researchers, and members of the broader general public.
  5. LC will take the discussions and presentations at the meeting into consideration in framing future funding activities of its NDIIPP digital preservation program.

    In a related development, in July the Copyright Office of the Library of Congress issued a request for comment on a proposed amendment to the regulations regarding mandatory copyright deposit of serials published in online format only. If enacted, the proposed revision would enable the Register of Copyright to demand deposit in the Library of Congress of electronic copies of any "serial" published in the United States only on the Web. (Online-only publications are currently exempt from the mandatory deposit requirement.) The change may enable LC to begin to acquire and archive electronic journals and news publications, presumably including some form of online news content, on a systematic basis. The announcement noted that "The Library is currently developing technological systems that will allow it to electronically ingest electronic-only works and maintain them in formats suitable for long-term preservation."

    While the initial focus of the e-deposit effort is expected to be on journals, rather than news, the request for comment and the NDIIPP meeting together signal the Library’s intention to archive and make a broad array of serial digital content available to researchers.