Big Data: Uncharted Territory

Monday, October 17, 2016

In his 2013 paper “Big Microdata for Population Research” in the journal Demography, Steven Ruggles described an impending "explosion” in the availability of population data. Ruggles, Regents Professor of History and Population Studies at the University of Minnesota and Director of the Minnesota Population Center, estimated that by 2018 researchers will have access to “over two billion records of microdata from over 100 countries, dating from 1703 to the present,” and predicted that these new resources will enable "transformative research on demographic and economic change and the spatial organization of society.” 

Indeed vast population data sets, immense global financial databases, multi-layered geospatial data, and high-definition satellite imagery are now the raw materials of research in disciplines ranging from history to public policy to environmental science. This is a very large field, however, and the territory is not well mapped. What distinguishes Big Data resources from other scholarly resources, aside from the sheer volume of content involved, is that libraries are a relatively small sector of the market for such information.  

Specialists in geospatial, financial and population data cite some common issues: overly restrictive terms of access; prohibitive or inappropriate pricing; purging of "historical” data from open access and commercial data repositories; and the embedding of data in proprietary tools and analytical services. The use of international census data, technically in the public domain, is frequently restricted by the governments collecting that data. And the massive amount of information about public opinion gathered by political consultants and pollsters, and by digital media platforms like Twitter and Google, is behind increasingly restrictive paywalls.

There is good news as well: Steven Ruggles reports that IPUMS International has made great headway in opening up anonymized census microdata from over 100 national statistical agencies, and 140 years of structured U.S. census data was recently donated to IPUMS for academic research by two commercial firms, and FamilySearch.

On November 16 CRL’s first eDesiderata Forum, Licensing Big Datawill explore some of the challenges libraries face in meeting campus research needs in these areas.  This virtual event will feature a set of conversations with individuals immersed in the work of ensuring access to Big Data resources for scholars and students at U.S. and Canadian universities. CRL has negotiated purchase and subscription of a few large databases, like Data Planet, Electronic Enlightenment, and Statista, on behalf of member libraries. We expect the eDesiderata Forum to be an annual event, to provide a knowledge base for our ongoing licensing activities, and to inform member library investment in electronic resources.  

Bernard F. Reilly
President (2001-2019)
Center for Research Libraries