Libraries and the Global Information Supply Chain

Thursday, August 16, 2018

The 2018 CRL Global Collections Forum convened on May 17 and 18, to examine the evolving global “supply chain” for source materials for research. The Forum is an annual venue for sharing ideas and expertise on collections and digital resources, particularly news, government records, and economic and geospatial data, and devising strategies for collective action by CRL and its community. The present post summarizes the key ideas from the Forum, and their implications for the community. The complete recordings of the Forum presentations and discussions are available on the event site. 

The Conversation

The Forum presentations and discussions, recordings of which are available on the event site, examined some of the profound changes occurring in the last ten years in the provisioning of data and documentation for research. UC Santa Barbara’s David Marshall reflected on the topic from a scholar’s perspective. He sees the present as a “moment of transformation”, when “the ground is shifting in the information landscape and around scholarship itself” in ways that jeopardize access to vital materials. Materials as diverse as commercial satellite data, used by Stanford researchers to monitor the movements of Ethiopian and Mongolian nomads (Sweetkind-Singer), and the files of colonial and post-colonial regimes in Kenya and Uganda (Peterson), come with a variety of challenges. 

Speakers identified three forces driving the transformation: money, technology and politics. As the economic value of data, especially actionable, real-time data, has burgeoned, important public interest information, from earth imagery to environmental data, is being monetized on an unprecedented scale by commercial firms like Digital Globe (Sweetkind-Singer) and agribusinesses like John Deere (Knezevic). 

The overwhelming amount of data being generated by new digital sensing and social media technologies, and being maintained in the cloud, creates enormous opportunities for scholars but also creates troubling issues around ownership and reproducibility of research. The scale and complexity of these data are beyond what universities even as well-resourced as Stanford can maintain (Sweetkind-Singer).

Recent headlines about the misuse of Facebook data by Cambridge Analytica and LexisNexis’s cooperation with ICE suggest the extent to which party politics and national security concerns now shape information systems and products and the terms of access to same (Marshall, Lamdan). And as publishers like Data Planet and SAGE productize data on climate change and census harvested from federal agency websites, the role of governments as a central data provider is changing (Blaemers). 

Many business practices in this realm are “alien to the ethos of libraries” (Lynch, Lamdan). Because academic libraries are not the primary client base for most data providers, adherence to our industry norms --transparency, user privacy, and barrier-free access – is not a given. Contention around the disposition of archives in former British colonies indicates how investments in digitization often may need to be made without the promise of access in the foreseeable future (Peterson). This is contrary to the thinking of libraries and many funders (Guy, Steel).

These new developments lead to problems like high costs, non-transparency of critical platforms and information systems, and arbitrary restrictions on uses. Moreover, as researchers become more heavily reliant on geospatial, census, demographic, and financial data residing in the cloud, and on third party-produced tools to mine that data, dealings with service providers become more complex and specialized. In the past, when the terms under which databases and datasets were acquired and used were “one size fits all,” libraries were the natural locus of those dealings. Today researchers often bypass libraries, and interact with data providers directly.

It was the consensus of attendees that library practices must adapt to the new realities. While the goal of libraries is to ensure the long-term integrity and accessibility of important data, documentation and evidence for scholars today and tomorrow, our traditional templates for doing this – collecting and licensing – will take us only so far. Clifford Lynch suggested a “fundamental rethinking of stewardship.”

What’s Next for CRL and its Community?

The Forum reaffirmed the importance of CRL’s focus on access to data and documentation from the Global South and Global East, which tends to be more difficult to obtain, costly and most likely to disappear. The exchange of ideas and perspectives also suggested some broader objectives for CRL.

  • Analysis and Transparency: More and more large databases and their enabling platforms tend are "black boxes", whose workings are only dimly understood by users. There is a need for a better understanding of the economics and technologies that affect production and distribution of data and evidence, and on the importance of “documenting what is taking place in a more aggressive fashion” (Lynch). Librarians should “talk with faculty about who owns the information” (Marshall), and about how industries like agriculture, finance, and national security skew the costs and terms of access to information (Knezevic, Lamdan). There is a need for greater scrutiny of the workings of the platforms and “black box” systems and algorithms on which scholars rely (Lynch). CRL should therefore increase its emphasis on assessment of large databases and their enabling platforms, sharing findings through reports and forums. (Attendees affirmed the value of  the discussion at this Forum.) Gathering and sharing this kind of intelligence will also provide greater support for local library decisions on data collection and licensing negotiations.

  • Alternative Models for Digital Investment: The community must identify new funding and financial models for access to digital resources. The conventional templates--licensing commercial resources and funding open access resources--are ill-suited to the current environment. A “one-size-fits-all” approach to licensing fails to address the diversity of scholarly needs. And insistence on barrier-free access can often be at odds with political realities in source communities. CRL licensing and digital investment then should identify and test new models that engage the source communities and commercial actors, and provide data and tools to support customized arrangements between libraries and data sources.
  • Greater and More Flexible Engagement with Data Providers: There is consensus on the need for collective dealings with the cloud providers and data suppliers, who now operate on a gargantuan scale. “Stewardship interventions have to take place at very different points in the lifecycle of information . . . and will have to be more active” (Lynch). An example of constructive engagement is the social data initiative recently launched by the SSRC in cooperation with Facebook (Marshall). Engagement will also involve new types of  arrangements with the gatherers and “owners” of knowledge (Guy, Peterson). The “post-custodial” approach to dealing with data providers in the global South, for instance, will necessarily require vesting more authority in local institutions and groups. While that runs counter to the traditional curatorial model of Northern and Western institutions, CRL should encourage and promote such models in its own digital ventures and in its work with NERL, building into those efforts support for local digital capabilities outside the library domain. 

What we learned from the Forum will guide collective action on digital investment and licensing under the CRL umbrella and, we hope, lead to more informed and prudent decisions by individual CRL libraries. CRL is indebted to the speakers and other members of the CRL community for the wealth of ideas and knowledge they shared during the Forum. 

Bernard F. Reilly
President (2001-2019)
Center for Research Libraries