O Acervo Estadão (O Estado de São Paulo digital archive)
First published on January 4, 1875, O Estado de São Paulo is one of the major daily newspapers of Brazil.
On May 23, 2012, the newspaper announced the digitization of its entire collection, accessible through the portal O Acervo Estadão (www.estadao.com.br/acervo). The archival content is accessible to current subscribers of print or digital versions of O Estado de São Paulo. Nonsubscribers may access limited portions of content after registration.
Sources for this review include information publicly posted or obtained directly from the publisher, data collected by CRL staff and members, and examination of the digital collection when possible. Other sources are noted where cited.
O Estado de São Paulo is a daily newspaper published in São Paulo, the largest city in Brazil. The title began as A Provincia de São Paulo on January 4, 1875. The paper's founders (including future president Manuel Ferraz de Campos Sales) were committed to republican ideals and spoke out against the monarchy and slavery in the press.
After the transition of Brazil to the Republic, the publication was renamed O Estado de São Paulo in December 1889. Estado, known for its seriousness, clarity of opinion, and quality of news reporting, had a wide readership, with a circulation of nearly 18,000 issues by 1897. It has remained a continuous independent source of news (with the exception of five years during the rule of Getúlio Vargas).
According to the publisher, the digitized content includes the entire run of the newspaper (including supplements) from its first issue in 1875 to the present (except for a moving wall of 30 days). The collection amounts to more than 2.4 million pages, comprising more than 10 terabytes of data. Pages were scanned from microfilm copy produced in the 1970s.. The site contains both the Sao Paulo and the Brazil (“national”) editions. All issues as of January 1, 2001, are presented as full-color page images.
Notably, the archive offers the full run of content issued under periods of censorship. Over several periods (including the paper’s “occupation” by the military dictatorship of Getúlio Vargas from 1940 to 1945), the paper had to censor offending articles. Forbidden to leave censored space blank, Estado frequently substituted verses of poetry or court orders in place of the excised article. The new digital collection includes images not only of the pages modified under censorship but also the original suppressed articles.
The portal O Acervo Estadão is presented entirely in Portuguese, the language of the country and newspaper, with no English interface. This intuitive database still can be easily used by anyone familiar with basic functionality of news databases.
The main interface offers a “Google-like” basic search for articles across the entire collection. Users may opt to search the full archive, front pages only, or only the censored material. Contents have been OCR’ed and indexed to the page level: unfortunately the user can not search by article title or headwords. Additional metadata has been applied for particular category sections (cadernos) of the paper. It is not apparent whether the 75 categories of article types apply consistently to sections of the paper over time. 
Search results are displayed according to relevance, with page images and highlighted text. There does not appear to be an option to sort results (by date, for example), but users may filter results by:
- Edition (as mentioned, the site contains both the “Sao Paulo” and “Brasil” editions where available);
- Date, using a handsome graphical interface for decade, year, and month of publication;
- News section (general news, cities, economy, editorial, sport, real estate, etc.)
No advanced option is available for initial searching. However, in addition to filtering search results, the user may also apply advanced searching (busca avançada) following the initial query by specifying exact phrases, excluded words, or limiting the search to a particular date range (at the decade-level only).
The interface also allows searching for text within a particular page; highlighted searched terms facilitate navigation within the page. This feature appeared to work sporadically during the review period.
To browse by a particular date, the Acervo portal offers both a dropdown menu for dates as well as a graphical interface linked from the homepage to quickly deliver the user to a page view. Both work well, with the latter offering a slight advantage, as it displays which dates are available (for instance, January 1891 has several dates in which copies of the paper are missing or not published. The dropdown menu does not indicate missing dates, which would result in an error message, whereas the calendar browse more clearly expresses dates not available).
The platform offers several options for accessing the content. The two principal means are full-text search and date browsing.
Topics and Personalities
An added feature of the site is the ability to browse articles based on topics or personalities of interest. The Acervo presently lists more than 320 personages and 32 topics encompassing events of regional interest (abolition of slavery, establishment of the Estado Novo, samba) or global significance (AIDS, Russian revolution, stock market crash of 1929). Each selection has an introductory essay and a selection of issues picked by the editor of the topic or personality.
The Acervo allows printing of page images. However, it does not permit downloading of page images in text-searchable form (PDF, etc). Using standard browser functionality, users may download JPEG images of pages (96 dpi) to their local drive, but all search functionality and accompanying metadata is stripped out.
Strengths and weaknesses
O Estado de São Paulo is a key resource for independent reporting on Brazil and South America, and it is valuable for researchers to have electronic access to a comprehensive run of this publication. A unique and particularly valuable historical aspect is the inclusion of originally censored content from the 1970s and earlier time periods.
While there is no English language interface, the platform functionality is fairly intuitive. An interesting, if unorthodox, feature enables the ability to search on front pages only. This allows for narrowing searches to stories or events of significance, particularly since article titles are not separately indexed.
On first inspection, the OCR quality appears to be high, even while some images are less crisp than others (probably a result of scanning from microfilm rather than from print originals).
The database is geared more toward a popular audience rather than to academics (as reflected in its pricing and subscription model, below). As a discovery and navigation tool, the product falls short of a “research-class” database. The limited search functionality and lack of ability to sort, download, or cite materials are well below what researchers have come to expect from robust platforms.
Despite these shortcomings, the display, filtering tools, and other features make use of some of the more recent advancements seen in popular databases and will no doubt appeal to a wide demographic for casual research.
Unfortunately, institutional subscriptions provide only a password login rather than facilitating access by researchers across a full sitewide IP range.
Nonsubscribers can browse the site, but must be registered to view pages. Registered patrons who have not subscribed to the paper or digital versions of Estado can view up to 20 pages in a period of 30 calendar days.
Annual digital access subscriptions are available. Subscribers to the print and/or digital versions of O Estado de São Paulo have unlimited access to the archive.
Grupo Estado has opened access to dozens of public institutions in Brazil. Institutions in North America can now subscribe annually, but receive password login rather than sitewide IP access.
 Some articles appear under a clear headers (e.g. “Suas contas”). Others may correspond to particular subjects but are not under a particular section within the paper.
Direct from Publisher
|Subjects covered||General interest, politics, economy, sport, culture, science|
|Geographic coverage||Brazil; also world coverage|
|Total pages||2.4 million|
|Digital collection launch date||May 2012|
|Update frequency||Constant (30 day moving wall for current content)|
|Authentication options||Password login|
|Archiving solution – master files||NA|
|Archiving solution – derivative files||NA|
|Availability in web discovery tools||NA|
|Open URL target||NA|
|Federated searching, z39.50||NA|
|Local host option||N|
|Full text displayed||N|
|Color images||Y (2001-); approx. 10%|
|Search full text||Y|
|Search within results||Y|
|Limit results by dates and/or document types||Y|
|Display highlighted search terms||Y|
|Display snippet -- search term in context||Y|
|Download PDF||N (download page images without searchable text)|
|Print full document||N|
|Restrictions on use||NA|
|Publisher / Distributor||Grupo Estado|
|Address||Avenida Engenheiro Caetano Alvares No. 55, Limao Sao Paulo São Paulo Brazil - 02598 900|
|URL; Contact||http://acervo.estadao.com.br/; email@example.com|
|CRL Profile of Publisher|
|Multiple year payments option||NA|
|List of purchasers available||N|
|Sample license available||N|
|MARC records purchase fee||N|
|Price tier basis||NA|