Funded by the National Science Foundation under its Strategic Technologies for Cyberinfrastructure Program, CRL initiated a two-year project (Jan. 2008-Dec. 2009) to analyze established, "long-lived" collections of data and digital resources, and to create tools and metrics for developing and assessing new repositories. The CRL case studies identified the practices, strategies and mechanisms that have enabled those repositories to sustain massive data collections over substantial periods of time.
The creation and collection of massive amounts of digital data by the sciences and social sciences today is creating stewardship demands that cannot be met fully by traditional libraries and archive organizations. During the past three decades large, new repositories of digital data have emerged to meet the needs of scientists and of researchers in the social sciences and humanities. Data stewardship is now undertaken by federal agencies, discipline-based consortia of scientists and researchers, supercomputer centers, universities, institutes, and for-profit corporations like ProQuest, ExxonMobil, and Google. Some emerging data repositories have flourished and persisted; others have not.
The project generated and disseminated innovative models, risk assessment tools, cost data and metrics to enable informed planning and prudent investment in Cyberinfrastructure by the NSF and other federal agencies, universities, scientific consortia and institutes, corporations, publishers, and other stakeholders across the spectrum of science, social science, and humanities communities.
The subjects of the case studies include the following repositories:
Supported by the National Science Foundation, CRL is working to identify practices and strategies that contribute to the successful long-term maintenance of digital collections and archives. The project will create a base of information and resources to guide investment in digital repositories by funders, universities, consortia, and other organizations.
Purpose of the roundtables: The four roundtables will acquaint funding agency administrators and policy-makers, program officers and other appropriate staff with preliminary findings of the CRL Long-Lived Digital Collections project. CRL will use the roundtables to gain input on the tools and templates produced by the project in order to improve their usefulness and value, and will seek ideas from funders on potential subjects for future case studies.
The roundtable format will combine formal presentation and open discussion to promote the free exchange of information and ideas.
Who should attend? Program officers for research, preservation, and museum, archive and library programs, senior administrators and policymakers, financial officers, information resources, and legal counsel; each half-day roundtable will be limited to 12 to 25 attendees.
Roundtable 1 (Tuesday, Jan 26, 2010)
Roundtable 2 (Wednesday, Jan 27, 2010)
Roundtable 3 (Tuesday, Feb 9, 2010)
Roundtable 4 (Wednesday, Feb 10, 2010)
Roundtable 1--Introduction to the Case Studies and Case Study Repositories
Tuesday, January 26, 2010
1:00 - 4:00 PM
1: 00 Repositories : The first roundtable will introduce the case studies project and profile the repositories that were subjects of those studies. The session will examine how the repositories differ from traditional "memory institutions" in managing humanities and social science material and evidence.
2:00 Content : What types of digital content are managed by the repositories? How have they adapted to changes in the way those materials are produced and fluctuations in the value of digital content over time?
2:45 Break
3:00 Stakeholders : Who are the various communities with vested interests in the preservation of digital content, and what are their needs?
Background Documents
AP_Profile [3], NORC General Social Survey Profile [19], UMI Profile [10],
OAIS Reference Model [20], Continuity Factors [21]
Trustworthy Repository Audit Checklist (TRAC) [22]
Roundtable 2--Repositories and their Environments
Wednesday, January 27, 2010
9:30 AM - Noon
9:30 Environments: Roundtable 2 will examine how the legal, economic and technical environments in which the repositories and other stakeholders operate impact the longevity and integrity of the digital humanities collections they hold.
10:30 Break
10:45 Costs and Benefits: What factors drive the costs of digital repositories? What kinds of benefits, monetary and non-monetary, do such repositories generate?
Background Documents
Agreements : UMI_dissertation_publishing_agreement [23] ; IDEALS_Deposit agreement [24]
OAIS Reference Model [20]
Trustworthy Repository Audit Checklist (TRAC) [22]
Roundtable 3-- Repository Operations: Organizational Strategies, Policies and Activities
Tuesday, February 9, 2010
1:00 - 3:30 PM
1: 00 Organizations: Roundtable 3 will explore how particular organizational structures and ways in which repositories organize and distribute their activities promote or hamper, the acquisition and preservation of digital content; and to what extent centralization of activities such as ingest, hosting, and processing of digital content, and the geographic concentration of specialized capabilities and expertise, foster the preservation of digital content.
1:45 Diversity and Complexity: What effect do the number and diversity of its sources of digital content have on a repository's ability to acquire and manage that content? To what extent does the variety and complexity of digital content archived by a repository affect its ability to maintain that content?
2:30 Break
2:45 Services: how do different types of services and outputs, such as tools, accessibility, interoperability, and discoverability enhance an archive's ability to attract content and support?
Background Documents
Organizational Charts : NCAR (EOS) Org chart [25] ; USGS (EOS) Org_chart [26]
Policies: USGS Records Management Plan [27]; IDEALS Preservation Policy [28]; ICPSR Preservation Plan [29]; ProQuest_Preservation_Policy 2008 [30]
OAIS Reference Model
Trustworthy Repository Audit Checklist (TRAC) [20]
Roundtable 4 -- Assessing Digital preservation and Sustainability Plans
Wednesday, February 10, 2010
9:30 AM-12:15 PM
9:30 Tools, Templates and Terms: Roundtable 4 will being with presentation of the models of digital repositories, the risk management and assurance matrix, and other tools and information for funders and practitioners developed through the case studies. Attendees will then critique and comment on the features and potential usefulness of those tools.
10:30 Break
10:45 Requirements for Sustainability Plans: What types of information can funding applicants provide to support their case for the sustainability of their data management plans?
11:45 Reporting Requirements and Accountability Structures: What reporting and disclosure requirements can funders impose to promote the longevity and integrity of digital collections they support? How can accountability mechanisms lead to the better management of digital collections and archives?
Background Documents
Risk Assessment and Assurance Matrix
Repository Reference Model
These continuity factors are based on CRL's case studies of long-lived digital repositories, supported by the National Science Foundation. The case studies identified a number of strategies and practices that successful repositories use to help mitigate risks. Ten strategic factors in particular seem to tend to promote continuity in the efforts of the repositories studied thus far.
Those continuity factors, are:
1. Quality Management: The baseline definition of sustainability is embodied in the requirements of ISO 9001-2000, the standard for process-based quality management systems. This standard calls for organizations to continually monitor and improve their services and products to satisfy user requirements while optimizing available resources. The standard identifies basic traits, such as clear lines of authority and accountability, robust policy infrastructure, and well-designed processes and mechanisms that preserve the value of an organization’s services.
2. Domain Dominance: Some repositories come to scale quickly, and rapidly surpass or co-opt alternatives or competitors in the field. Such repositories rapidly capture a critical mass or comprehensive share of the existing data for a discipline or a domain. The ability of a repository to eliminate redundant or rival efforts is essential in avoiding the high costs that often accompany unproductive competition for resources and clients. Domain dominance also eliminates the waste of resources that arises from ambiguity and uncertainty in the market.
Clear pre-eminence of a repository or its controlling organization in a particular domain can also enable the repository to set standards for the production, exchange and use of data; impose uniformity in the instrumentation and tools employed to produce and use the data; and otherwise shape the landscape to the repository’s advantage. Exemplars: Associated Press, Chemical Abstracts Service, UMI.
3. Concentration / Clarity of Purpose: Long-lived repositories clearly delineate their territory and user communities; precisely specify services and other outputs; and articulate their missions clearly and consistently to stakeholders. Some repositories focus relentlessly on a particular type of content or data and thus are able to realize economies of scale in key processes. Others focus on single fields of endeavor or research, and are thereby able to capture as large a share of the contributor/user populations as possible. Exemplars: ICPSR, CAS, General Social Survey.
4. Market Diversity: Enduring repositories tend to cultivate and support multiple communities of interest that reside in different geographic, demographic or economic sectors. This enables a repository to survive downturns in any single sector of its market. Repositories that rely predominantly on public funding, for example, are vulnerable in periods of political change and nation- or region-based economic crises. Exemplars: CAS, AP.
5. Multiple Versioning and Outputs: The ability of a repository to generate multiple derivative products and services from its core content and activities also seems to promote longevity. Exemplars: AP, CAS. Associated Press (AP) has been able to generate revenue from uses of its older text archives for longitudinal studies of business performance. CAS has built an array of databases and published indexes based upon its traditional core activity: the abstracting and analysis of chemical literature.
6. Control of Supply and Distribution Channels: Some repositories are able to control the amount and complexity of the content or data they accept. Since amount and complexity of content are cost drivers, a repository must ensure that its intake and processing costs remain commensurate with the value of its outputs to its user communities.
7. Incentive Systems: Effective repositories offer powerful incentives for the preparation and deposit of data, and support of the repository. Incentives can be monetary, functional, material, or reputational returns, in exchange for the contribution of content, services, and other forms of support to the repository. Simply put, those scientists and researchers that realize a return from making data available are likely to find ways to continue to contribute. This dynamic is demonstrated not only by proprietary, subscription-based data collections like ICPSR but in the open access models explored by AP, where impact and relevance earn advertising revenues for the publisher. Reputational capital and its corollary, professional advancement, are powerful rewards in the sciences. Exemplar: Chemical Abstracts Service. The early CAS editors generated and maintained a sense of community among CAS abstractors, which enabled the organization to build a cohort of thousands of volunteers.
8. Environmental Sensitivity: To be effective, repositories must put in place the means to detect changes, positive or negative, in the technology, business and legal environments in which they operate. These should be not just reactive but, to the extent possible, predictive as well. This enables a repository to anticipate changes that might threaten its viability, degrade the usefulness or functionality of its content, or undermine its value to the stakeholders. These often go beyond simple technology watch mechanisms, and ideally position the repository to shape its economic, legal, regulatory, and technology environments. Exemplars: AP, UMI, CAS
9. Robust Feedback Mechanisms: One form of environmental sensitivity is a repository’s responsiveness to its producer and user communities. Effective mechanisms for obtaining feedback from stakeholders, and responsiveness to that feedback, as well as perception of responsiveness to same. This enables the repository to adapt promptly and effectively to changes in the practices, behaviors and expectations of users and producers of data. The most direct and efficient feedback loops are when those who maintain the data are also the producers and/or users of same. Exemplars: CAS, UMI, GSS.
10. Structural Accountability: The repository has in place governance and internal organizational processes and structures that ensure the continual disclosure of key information to stakeholders, and all but guarantee organizational responsiveness to stakeholder concerns. In concrete terms, this means that the primary governance bodies are duly constituted and empowered, and mirror the populations of predominant stakeholder sectors. It can also be evidenced by a sound written and formal escalation path for the organization, consisting of progressive reporting and referral upward of unresolved issues. Exemplars: CAS, ICPSR.
Links
[1] http://www.arabidopsis.org/
[2] http://www.ap.org/
[3] https://www.crl.edu/sites/default/files/d6/attachments/pages/AP%20Profile%20final_1.pdf
[4] http://www.cas.org/
[5] https://www.crl.edu/sites/default/files/d6/attachments/pages/CAS%2520profile.pdf
[6] http://www.norc.org/GSS+Website/
[7] https://www.crl.edu/sites/default/files/d6/attachments/pages/Profile_NORC_012310_0.pdf
[8] http://www.eol.ucar.edu
[9] http://www.proquest.com/en-US/products/dissertations/
[10] https://www.crl.edu/sites/default/files/d6/attachments/pages/umi_dissertations_0.pdf
[11] https://www.crl.edu/sites/default/files/d6/attachments/pages/Canadian_Lib_UMI_Agreemt_0.pdf
[12] https://www.crl.edu/sites/default/files/d6/attachments/pages/Lib_Congress_UMI_Agreemt_0.pdf
[13] http://www.sdss.org/
[14] https://www.crl.edu/eros.usgs.gov
[15] https://www.crl.edu/facets/science-and-technology
[16] https://www.crl.edu/facets/statistics
[17] https://www.crl.edu/facets/archiving-and-preservation
[18] https://www.crl.edu/reports
[19] https://www.crl.edu/sites/default/files/d6/attachments/pages/Profile_NORC_012310_1.pdf
[20] http://public.ccsds.org/publications/archive/650x0b1.pdf
[21] https://www.crl.edu/sites/default/files/d6/attachments/pages/Continuity%20Factors.pdf
[22] https://www.crl.edu/sites/default/files/d6/attachments/pages/trac_0.pdf
[23] https://www.crl.edu/sites/default/files/d6/attachments/pages/UMI_dissertation_publishing_agreement.pdf
[24] https://www.crl.edu/sites/default/files/d6/attachments/pages/IDEALS_Deposit%20agreement.pdf
[25] https://www.crl.edu/sites/default/files/d6/attachments/pages/org_chartncar.pdf
[26] https://www.crl.edu/sites/default/files/d6/attachments/pages/EROS_Org_Chart.ppt
[27] http://eros.usgs.gov/government/records/media/EROS_Records_Management_Plan.doc
[28] https://services.ideals.illinois.edu/wiki/bin/view/IDEALS/IDEALSDigitalPreservationPolicy
[29] http://www.icpsr.umich.edu/icpsrweb/ICPSR/curation/preservation/policies/dpp-framework.jsp
[30] https://www.crl.edu/sites/default/files/d6/attachments/pages/ProQuest_Preservation_Policy_3_2008.pdf