Broadcast News Transcripts in Academic News Databases

Image from Infotrac Newsstand showing a broadcast transcript for PBS Newshour (November 22, 2012).

Despite the growing diversity of news sources, television remains the most popular source for Americans to follow news.1 Interest in television news broadcast content for research purposes also continues to grow as tools to access and interpret broadcast become more widely available. Transcripts of selected news broadcasts have been accessible in print and microform since the mid-20th century. The Vanderbilt Television News Archive (VTNA) began recording television news programs in 1968 for academic research. New sources of information from television news—such as the UCLA Library Broadcast NewsScape and the Internet Archive TV News Archive—continue to transform the research potential of this critical media.

In 2013, the Center for Research Libraries conducted a comparison of six academic databases regarding the extent and accessibility of broadcast transcripts for national and international news. The survey aimed to assess the coverage and scope of broadcast media (television, radio) in the major research resources subscribed by CRL member institutions, measuring the relative strengths of each product in terms of depth of coverage and uniqueness. CRL also sought to assess the overlap in content among the major providers, and to compare findings to those of community-driven efforts such as the Vanderbilt Television News Archive and the emergent UCLA Library Broadcast NewsScape.

The six databases assessed during the 2013 study are listed below2:

  • Access World News (NewsBank)
  • Factiva
  • InfoTrac Newsstand (Gale Cengage)
  • LexisNexis Academic
  • Newspaper Source (EBSCO)
  • ProQuest Newsstand

CRL’s full assessment revealed a number of unique characteristics for each product. In addition to content assessment, CRL evaluated the ease of use and the ability to quickly identify and access information in broadcast transcripts using both simple and advanced search functions..

A summation of the content coverage is reported below.

News Coverage Compared

Factiva and LexisNexis Academic are large databases with several thousand research resources, most of which are non-news sources. Factiva focuses on resources that influence business. It provides access to a reported 35,000 sources, including more than 3,300 US, regional, and international newspapers; approximately 1,200 newswires; news websites; media transcripts; blogs; and multimedia. LexisNexis provides access to legal resources in addition to major news sources. It reports more than 15,000 total sources, of which an estimated 3,000 are classed as newspapers and an additional 1,000 sources from newswires, web-based news publications, and news transcripts.

The other four products focus primarily on news: Access World News has nearly 4,400 news sources (3,200+ newspaper titles); InfoTrac Newsstand has 2,900 titles (1,600 of which are newspapers); ProQuest Newsstand and EBSCO’s Newspaper Source Plus both contain approximately 1,600 news sources, of which nearly 1,200 are full-text newspapers.

These figures, compiled in 2014 from the most recent product source lists, show continued growth in news titles across all aggregators compared to 2013 figures. This suggests an ongoing commitment by the providers to expand the availability of news sources for academic research.

Broadcast Transcript Availability

All six products provide access to news broadcast transcripts, though the depth of coverage and time span varies. Using vendor source lists (2013), the following numbers of sources for transcripts were identified:

The numbers above do not necessarily indicate relative total size of collections, as each product lists sources in a different way. For instance, Access World News lists mostly broadcast stations included, whereas others list individual programs. Providers were not consistent in how programs and sources were listed.

For all products, transcripts are predominantly in English, gathered primarily from US sources. Generally, all products present “full-text” coverage of the aggregated broadcasts, with only a handful of sources represented as selected coverage or abstracts.

Assessment of Broadcast Source Content

Factiva reports offering transcripts for 430 news programs, spanning approximately 1990 to present. Factiva provides television and radio transcripts from the major national and cable news networks, including ABC, CBS, CNN, FOX, NBC (with CNBC and MSNBC), NPR, and PBS. Factiva works with providers such as ASC Services (formerly Morningside Partners) in the provision of content, as well as direct supply from publishers.

Factiva includes international broadcast content from the Australian Broadcasting Corporation, BBC Monitoring, CNN International, CTV Television Network (Canada), Deutsche Welle (Germany’s international broadcaster), and Euronews (France). Factiva appears to be aggressively adding international and foreign-language broadcast content, such as from BFM TV and Radio Monte Carlo (France); multiple Arabic-language sources such as al Qarra TV(France, focusing on news of the African continent), al Arabiya (UAE, Gulf Region), Middle East Broadcasting Center (MBC) in Saudi Arabia, and CNBC Arabiya; and TV Express Search service covering multiple stations in Japan.

Transcript coverage is extensive, with national commercial network programs extending back to the 1990s (NBC coverage begins November 1989, CBS from 1990, and ABC from 1994), and cable news offerings dating back to 1997. Of the 430 programs/sources listed, approximately 40 percent are currently updated.

Factiva contains numerous programs not listed in other products’ source lists. However, the variance in how aggregators choose to list source titles makes difficult to ascertain true uniqueness among products. Factiva lists multiple program titles—mostly non-current—from CNN and ABC News Now (ABC’s 24-hour news channel) not listed by the other aggregators. Most unique content offerings, however, are primarily from the international sources. Factiva is the only product listing Deutsche Welle content, as well as the Arabic and French sources described above. In 2013, it added coverage from several Russian-language broadcasts from Ukraine.

LexisNexis Academic (LN) reportedly includes broadcast transcripts from more than 450 programs, primarily from mainstream US national news sources, but also from Canada, Australia, UK, France and the Asia-Pacific region. Like Factiva, LN sources some content directly from publishers, but more frequently through thirdparty transcription services including the Federal News Service, CQ-Roll Call, and ASC/Morningside. International coverage is provided through sources such as BBC Monitoring and Euronews, and recently LNhas begun to incorporate international sources from other non-European areas, including South Africa (Summit TV, serving the South African business community) and India(Economic Times Now).

Coverage of broadcast transcripts is extensive, with many sources extending back to the 1990s or earlier, depending on the service or publisher (ABC coverage for World News Tonight in LN extends back to 1980, CCBS programs back to 1990, CNN to 1992, NBC to 1993). Of the 462 programs listed, approximately half are reported to be updated currently.

Content unique to LexisNexis appears to be primarily in the international news arena, including Channel NewsAsia, the aforementioned Summit TV, and selected programming from the CTV Television Network in Canada. Video content is provided by third-party sources including TVEyes, a media-monitoring service including full-text transcripts, and NewsLook, a video platform that aggregates news from multiple sources.

Access World News (AWN) lists nearly 230 transcript sources, though its source list includes only the name of the broadcasting station rather than the individualprogram titles the other vendors include as sources. AWN contains content from national providers including Bloomberg, CBS, CNBC, CNN, FOX News, MSNBC, NBC, NPR, and PBS. In addition, AWN’s source list includes approximately 200 local broadcast stations from around the United States, incorporating local news coverage by affiliate stations of the major networks, independent stations, and others.

AWN also features international content, including several unique sources: BBC selected transcripts, Independent Television Network (Sri Lanka), (India), and RBC TV (Russia).

Content coverage of national broadcasts begins roughly around 2003, with the addition of local station coverage beginning in late 2006. NPR content extends back to 1990.

Of the 230 listed sources, 160 (or 70 percent) are reported as “not current.” Nearly all of these sources are local broadcast stations (the exceptions being CNBC and CNNfn, which have limited coverage in the database, and CNN en Español, which was indexed from 2003 to 2011).

InfoTrac Newsstand includes approximately 100 transcript sources, provided primarily through CQ-Roll Call, Inc. Content includes programs from CBS, CNN, FOX, and Bloomberg. InfoTrac also aggregates selected transcripts from NBC, NPR, PBS, and the Federal News Service. InfoTrac includes AP transcripts beginning from January 2000.

Of the 102 sources listed, 71 titles are reported as currently updated. However, most of the content uniquely held by InfoTrac is no longer currently updated, such as local content from McClatchy-Tribune Information Services. The product contains selected sources for audio and video, such as Wall Street Journal This Morning and National Review Online content. InfoTrac lists Critical Mention, Inc. and ShadowTV as sources for local broadcast and Bloomberg video content, though details are not available.

Newspaper Source lists 136 different transcript sources, 60 percent of which are still currently supplied. Coverage includes programs from ABC, Bloomberg, CBS, CNN, FOX, MSNBC, NPR, and PBS. Newspaper Source also includes content from the Australian Broadcasting Corporation and Canada’s CBC Television. NPR content dates back to 1998, with other broadcast stations picking up between 2000 and 2003. CNN content begins around 2005.

ProQuest Newsstand presently only includes transcripts from BBC Monitoring, a division of the British Broadcasting Corporation that monitors and reports on mass media worldwide. Coverage dates back to 2003 for most BBC content.

Overlap in Broadcast Coverage

In comparing the various title lists, CRL found 729 unique listings among the six products. Approximately 40 percent of the titles were held by two or more vendors. The amount of overlap identified is as follows:

As suggested above, comparing coverage among the databases is difficult due to inconsistencies in how each aggregator reports its source list. Titles are not listed consistently, nor with any authority control. Not all sortable fields match from productto- product. Most significantly, some vendors list only the broadcast publisher as only one source, while others list each program from the publishers as individual source titles. Thus, in all likelihood the percentage of overlap is higher, but without deeper content analysis the reviewers were unable to conclusively align the title lists.

Depth of Coverage Compared

To compare depth of coverage, CRL sampled 42 titles held by three or more vendors to determine which vendors maintained the deepest collection of broadcast articles. CRL found that for all titles sampled, LexisNexis and Factiva consistently had the longest runs of coverage.

Though coverage varied widely from title to title, generally LexisNexis wwas found to have deeper backfiles of content than Factiva often having older runs of transcripts, ranging from a few additional months up to 14 years (the average difference was four years). Of the 42 titles sampled, LN had deeper backfiles for 20 of the titles. For eight titles, Factivahad deeper content, though generally extended by only one or a few additional years. Fourteen of the titles had more-or-less equal coverage in the two databases.

Comparing by major broadcaster and extent of runs, Factiva and LexisNexis are competitive in terms of content offering. Looking solely at total years of coverage across all programs of each major network,LexisNexis is stronger in coverage for CBS, NBC, and CNN, while Factiva holds a slight lead in coverage for ABC, FOX, NPR. These strengths vary, however, from year to year. LN is considerably stronger for early coverage of transcripts, while Factiva ramped up its collecting efforts in the 2000s. Both products maintain robust aggregation of all major networks from 2010 to present.

Content Comparison: Television News Archives

CRL examined coverage of Vanderbilt Television News Archive and the emergent UCLA Library Broadcast NewsScape to compare their news coverage to the textbased databases above. While these databases focus more on audiovisual capture and presentation than on broadcast transcripts, they are significant and growing efforts that provide an alternate means of accessing broadcast news programming.

While VTNAdoes not provide the same type of source list as the commercial aggregators, coverage may be summarized as follows::3

Regular News Programs

  • ABC Nightline: August 5, 1968–present
  • CBS Evening News: August 5, 1968–present
  • NBC Evening News: August 5, 1968–present
  • CNN: WorldView: October 2, 1995–November 3, 2000
  • Wolf Blitzer Reports: February 1–December 31, 2001
  • NewsNight: November 5, 2001–Oct 28, 2005
  • Anderson Cooper 360: November 1, 2005–present
  • FoxNews Reports: January 15, 2004–present
  • ABC Nightline: March 24, 1980–September 12, 1988: occasional coverage
  • ABC Nightline: September 12, 1988–present: comprehensive coverage.

Special Reports

The term “Special Reports” refers to news coverage of significant events broadcast outside the scope of the regular evening news programs. This part of the collection focuses on US presidential politics, including political conventions, election coverage, and speeches and press conferences of the president currently in office. The Special Reports collection also includes coverage of major national and world events and major military conflicts involving the US. In addition to the networks described above, VTNA contains special reports from the following broadcasters:

  • PBS
  • CNBC
  • Univision
  • Bloomberg

VTNA does not cover local news programming, or “news magazine” programs such as 60 Minutes and 20/20.

UCLA’s NewsScape contains a variety of national news programs and local news shows from the Los Angeles area. It began its coverage in January 2005, covering approximately a dozen news programs from ABC, CBS, FOX, NBC, and KCAL (an independent station featuring local broadcast news). It expanded its programming coverage in October 2006 to include CNN, Fox News, MSNBC, and additional national and local programs. It added Spanish-language content in August 2007. NewsScape relies on closed-captioning texts for its full-text searching, rather than transcripts of abstracts.

While NewsScape coverage does not go back as far as VTNA, its coverage of national news programming appears to be more extensive than Vanderbilt, including daytime programming and major cable news networks. NewsScape also includes extensive coverage of news-related programs of an entertainment nature (The View, Entertainment Tonight, EXTRA, The Colbert Report, Saturday Night Live, and so on). From a content perspective, NewsScape’s coverage of local broadcast, independent stations, and news-related programming sets much of its content apart from the other databases.

  1. and
  2. The statistics for the various products were collected between May and June 2013. As such, they represent a snapshot of the content coverage of the products, and may not represent current figures in 2014.