CRL
Published on CRL (https://www.crl.edu)

Home > SAMP > South Asia Open Archives (SAOA) > Proposals

Proposals

SAOA is developing carefully curated thematic research collections in various languages by digitizing key print and microfilm holdings supplied by our cooperative network of Member Institutions across South Asia and the U.S. The Content Curation Working Group encourages scholars and members of the public to use the online suggestion form [1] to submit suggested resources for inclusion in SAOA. Please contact Neel Agrawal [2] (South Asia Digital Librarian) with any questions or clarifications on using the form, or on any other query related to suggested items.

SAOA prioritizes the following types of out-of-copyright material for digitization:

  1. Official Publications
    • Gazetteers and Census Reports

    • Statistical and Annual Reports

  2. Serials and Newspapers
  3. Literary and Monographic Sources
  4. Non-Governmental Publications

Digitization Guidelines

These are SAOA’s technical guidelines for digital files derived from text-based materials (in print, microfilm, or microfiche) to be included in SAOA’s digital collections. Digitization providers (commercial entities as well as academic institutions) will be expected to conform to these specifications to ensure consistency of the digital materials for ingest into the SAOA digital asset management system. The following are the ideal specifications for ingesting image-based material into SAOA’s collections.

  1. At the outset of each project, the SAOA Project Manager will schedule a phone consultation with the digitization provider to help ensure that the digitization project conforms to the Digitization Guidelines laid out below. The digitization provider or content contributor should provide SAOA with:
    1. Estimates of the total number of images, total number of volumes (for serials and multi-volume monographs), and if possible, total file size (in MB, GB, or TB),

    2. Details regarding the condition of the print or microform material.

  2. Descriptive Metadata – the metadata should:
    1. Use one of the following metadata schemes: Dublin Core or MARC21.

    2. Be provided in one of the following metadata/catalog record file formats: MARC XML or CSV.

    3. Conform to SAOA’s metadata template (for example, for monographs vs serials).

    4. Include accurate holdings information for serials or multipart titles.

    5. Have been provided in a sample set of records for SAOA staff to review during the proposal phase, as specified above.

NOTE: the data entered into Forum (or copied/pasted into Forum) will be UTF-8. SAOA’s hosting platform defaults to UTF-8 encoding for data entry.

  1. Structural Metadata – appropriate structural metadata should be provided to help SAOA organize the image files and to allow for navigation within the item (for example, by chapter).
  2. Asset File Types – the following file types for each image of a given title should be provided:
    1. Master image files for preservation: TIFF images,

    2. Access files (image surrogates): JPEG, JPEG2000 (JP2), or PDF.
    3. OCR files (recommended, where available):
      1. .txt and,
      2. OCR XML or HOCR
  3. Image Capture
    1. TIFF master image files – these files are for long term preservation purposes and are deposited in a dark archive.
      1. Resolution: 400 ppi to 600 ppi for new digitization.
      2. Uncompressed, TIFF 6.0 images, in either “little endian” (IBM PC) or “big endian” (Mac) byte order.
      3. All files should be able to pass JHOVE format validation as valid and well-formed.
      4. 24-bit color for new digitization (8-bit grayscale may be acceptable for items already digitized or with no color content. Either no gray profile, or Gray Gamma 2.2). No proprietary scanner profiles.
      5. One page per image.
    2. JPEG, JP2, and PDF access files (image surrogates) – these images are for presentation purposes and are ingested and hosted on SAOA’s platform for researchers to access.
      1. Resolution: keep surrogate resolution the same as master TIFF file if the surrogate file size meets the requirements (see subsection iii, below). In some cases, it may be acceptable to decrease the resolution of the surrogate to a minimum of 300 ppi in order to decrease the file size.
      2. Compression level: between 10:1 and 15:1, depending upon the dimensions and color of the original.
      3. File Size: the size of each access file should range from 0.5 MB (megabytes) to 2.5 MB, depending on various factors (size of the original item, format, content, color, darkness).
    3. Image quality: images should meet the following characteristics, many of which may be available as automated settings on the scanner as part of the image capture option (e.g. microfilm scanning). In exceptional cases, post-processing or correction might be necessary to:
      1. Achieve desired tone distribution
      2. Sharpen images to match appearance of the originals
      3. Crop and/or deskew the images, oriented to the text (not to the page)
  4. File Naming
    1. Monographs (Single Volume)
      1. Format: titleID_YEAR_sequential image #.tif
      2. Example: 986786411_1915_00135.tif
        1. This would be for a monograph (single volume) published in 1915, 135th consecutive image.
    2. Monographs (Multi-Volume)
      1. Format: titleID_YEAR_VOLUME #_sequential image #.tif
      2. Example: 990512780_1918_003_00115.tif
        1. This would be for a monograph (multi-volume) published in 1918, volume 3, 115th consecutive image.
    3. Serials
      1. Numbered Issues:
        1. Format: titleID_YEAR_VOLUME #_ISSUE #_sequential image #.tif
        2. Example: 990312980_1915_002_001_00253.tif
          • This would be for a serial published in 1915, volume 2, issue 1, 253rd consecutive image.
      2. Dated Issues:
        1. Format: titleID_YEAR-MONTH-DAY_sequential image #.tif
        2. Example: 22123199_1921-12-24_00012.tif
          • This would be for a serial published on December 24, 1921, 12th consecutive image.
      3. Quarterly Issues:
        1. Format: titleID_YEAR_QUARTER_sequential image #.tif
        2. Example: 226114808_1895_Spring_00005.tif
          • This would be for a serial published in 1895, Spring issue, 5th consecutive image.
    4. FOR ALL THE ABOVE:
      1. File naming of the master and derivative access files must follow the same pattern.
        1. The .jp2 or .jpg derivative must have precisely the same filename as its corresponding master .tif file, except for the filename extension, i.e. "990512780_1918_003_00115.jp2" is derived from (corresponds to) the image of "990512780_1918_003_00115.tif.
        2. For the Title ID, assign the OCLC#.
        3. Allow for 5 digits for sequential image numbering and three digits for the volume and issue numbers.
  5. Folders
    1. Each title should be contained in a separate folder labelled by OCLC#.
    2. Within each title level folder (labeled as the OCLC#), there should be one TIFF folder and one access file folder (such as JPG, JP2, or PDF) with identical files, except for the extensions.
  6. File Transfer
    1. Acceptable methods of file transfer are via hard drive, USB drive, FTP, Dropbox, Google Drive, and CD.

 

Last updated: November 7, 2019

Digital File Management

SAOA is committed to the preservation of, access to and discovery of resources (digital assets as well as the associated metadata) created by and facilitated through SAOA.  The cooperative and federated nature of SAOA necessitates that this preservation, access, and discoverability may take many forms, differing timelines, and separately determined costs, depending on the nature and scope of each individual project.  As such, each project may need to be interpreted and championed on its own terms but SAOA is guided by the principles below. 

Preservation

The long term maintenance of digital content is central to SAOA’s mission but responsibility for it may be distributed depending on creator and institutional capacity.  To the extent possible, SAOA strives to preserve its digital files according to established digital preservation standards for stable and flexible format (ex., tif image files).  A long-term goal of SAOA is to preserve a dark archived master of every SAOA-affiliated resource in a SAOA-controlled repository.

Digital Files Created by SAOA (using SAOA funds)

Digital master files created by SAOA will be stored in a dark archive by CRL.

Digital Files Created by a SAOA Partner (not using SAOA funds)

If a SAOA partner has the institutional capacity to preserve digital files (a “trusted digital repository”), they will be maintained at the partner institution, pursuant to that institution’s policies and procedures.  Any access and/or discoverability files will be required to point back to the preservation files in associated metadata.

If the SAOA partner does not have the institutional capacity to preserve digital files, they will be transferred to a dark archive at CRL.

Access

Free and open access to materials associated with SAOA is paramount to SAOA’s mission.  Building upon SAOA’s distributed and federated nature, SAOA strives to avoid duplication and to encourage digital file accessibility (“hosting”) from multiple institutions. SAOA strives to provide access to multiple access file types (ex. JP2, JPG, PDF).

Digital Files Created by SAOA (using SAOA funds)

Digital files created by SAOA will be made openly accessible through SAOA platform(s).

Digital Files Created by a SAOA Partner (not using SAOA funds)

If a SAOA partner has the institutional capacity to make digital files accessible in a stable and sustainable way, with institutionally-supported permanent URLs for each item (i.e. to “host” them on their own repository servers), they will be maintained at the partner institution following that institution’s policies and procedures. 

If the SAOA partner does not have the institutional capacity to make digital files they have created accessible in that fashion, or if in the future, they are unable to maintain their provision of access, the files will be transferred to SAOA for ingest on SAOA’s platform(s).

Discoverability

SAOA resources are made valuable through their discovery and use.  The cooperative and federated nature of SAOA determines that this discovery may take many forms depending on the nature and scope of each individual project.  The long-term goal of SAOA is to enable integrated discovery across the SAOA corpus of resources, encompassing materials hosted through SAOA platforms, partners, and other institutions.

Metadata

All SAOA resources have sufficient technical and descriptive metadata to be discoverable.
All metadata will be openly and sustainably maintained on web-based platform(s).
All metadata will follow established standards (ex. MARC, Dublin Core).
All metadata will be open and exposed for harvesting, by SAOA or other interested institutions.

 

Last updated: December 21, 2017

Selection Guidelines

The Selection Guidelines, prepared by the Content Curation Working Group, help guide the evolution and expansion of SAOA’s curated collection, building on SAOA’s first Five-Year Plan, its five years of evolving collection development experience, and the FY21-25 Five-Year Plan. In our second five years we will broaden SAOA’s collection scope to incorporate additional themes, more coverage of under-represented geographic areas of South Asia, greater diversity of languages, communities, new resource types (such as audio/visual material, video, data sets, and maps), and wider date coverage (including post-colonial materials). With these criteria in mind, SAOA considers proposals submitted by anyone through its online suggestion form.

Themes

(The following themes are not mutually exclusive, due to their multidisciplinary scope.)

  • Social, Economic and Political History
  • Literature (including fiction, nonfiction, poetry, criticism, literary history, and biographies)
  • Women, Gender & Sexuality
  • Caste, Tribes & Social Structure
  • History of Science (including history of medicine)
  • Art History (including history of architecture)

Resource Types

  1. Official publications from colonial British India
    • Census reports, both before and after independence (SAOA has already digitized the decennial reports from 1871 to 1951, and plans to fill in gaps.)
    • Statistical reports, such as those on agriculture, land revenue and settlement, trade, commerce, and sanitation
    • Annual reports of departments produced for the Presidencies and Princely States. Prioritized categories of the India Office Collections - Official Publications include:
      • V/10 - Administration Reports
      • V/17 - Trade and Navigation Statements
      • V/24 - Departmental annual reports
      • V/26 - Committee and commission reports (for example, on plague and famine)
    • Gazetteers [3], including publications at the provincial and district levels, as well as those of the Princely States.
      • Imperial Gazetteer of India Provincial Series
      • Regional Gazetteers: District Gazetteers of all the Presidencies and Princely States
  2. Resources from colonial Ceylon and Nepal
  3. Nineteenth- and twentieth-century serials and newspapers
    • From the India Office Collections - Official Publications [4]:
      • V/6 - India Office serials
      • V/16 - Public Finance Serials
      • V/25 - Indian serials
    • Specific titles for digitization should be selected from standard bibliographies, such as:
      • Macdonald, T., Union catalogue of the serial publications of the Indian government 1858-1947 held in libraries in Britain. London, 1973
      • DSAL. International Union List of South Asian Newspapers and Gazettes [5]
    • SAOA has already digitized a portion of the holdings of the Native Newspaper Reports, and plans to fill in gaps for out-of-copyright materials
    • SAOA will continue collecting national, regional, and local newspapers. Specific topics could include trade and commerce, revolutionary newspapers and related to women
  4. Nineteenth- and twentieth-century monographs
    • SAOA will prioritize titles from the National Bibliography of Indian Literature, 1901-1953 (NBIL). Microfilm of titles already preserved under the Microfilming of Indian Publications Project (MIPP) [6]. MIPP materials should be scanned from the microfilm, and titles not yet preserved under MIPP should be digitized from print. (SAOA has already digitized some MIPP microfilm holdings from CRL, and plans to expand on the languages.)
  5. Audio and video resources (Including music performance and instruction, as well as interviews and oral histories)
  6. Visual resources (Including photographic archives and maps)
  7. Data sets
  8. Manuscript and archival collections (such as the Muslim League papers, the Indian National Congress papers and official correspondence, and manuscript collections (e.g. Columbia’s collection of Sanskrit manuscripts [7])

Collection Development and Collaboration Considerations

  • SAOA will fill in gaps, strengthening and enriching its current thematic collections: Social & Economic History, Literature, Women & Gender, and Caste & Social Structure.
  • In applying these Selection Guidelines, SAOA takes into account its Selection Principles [8].
  • Working under the auspices of the Center for Research Libraries, SAOA identifies collaboration partners and new SAOA members that are able to provide important resources for digitization, such as the Roja Muthiah Research Library (Chennai), Madan Puraskar Pustakalaya (Kathmandu), Centre for Studies in Social Sciences Calcutta, Mushfiq Khwaja Library and Research Centre (Karachi), CrossAsia (Heidelberg), and other interested institutions worldwide. SAOA works with member organizations of the Council of American Overseas Research Centers (including the American Institutes in Bangladesh, India, Pakistan, and Sri Lanka).
  • Additional factors SAOA considers when identifying resources to target for digitization include requirements and formal agreements for institutional collaboration with holding libraries; distribution of coverage across disciplines, languages, and geographical regions (selectively including diasporic communities); and the extent to which SAOA digitization can complement existing, well-established open-access initiatives.

Dated: May 15, 2020 & Updated June 4, 2020.

Prepared by the Content Curation Working Group: Aruna Magier (Chair), Deepa Banerjee, Abhijit Bhattacharya, Gary Hausman, Jeffrey Martin, Gautham Reddy

Selection Principles

SAOA fosters robust online research on South Asia through its mission to produce and preserve digital content, to make digital content openly accessible, and to foster communities committed to collaboration through open access.

The following principles, prepared by the SAOA Executive Board and reviewed by the SAOA membership, function as a dynamic document to inform collection development decisions for the allocation of resources (financial as well as human):

  • We concentrate on materials that have high value for research
  • We prioritize resources that will benefit researchers across many disciplines of South Asian Studies
  • We give precedence to materials that are at risk
  • We seek to digitize resources that complement and complete collections already available
  • We work to enhance modes of discovery of materials that are otherwise difficult to find or use
  • We prioritize creation of access to otherwise inaccessible or inadequately accessible resources
  • We strive for transparency in all decision-making processes, from initial proposal through production to end product
  • We recognize the value and independence of existing, credible and sustainable open access repositories and seek federated alliances with them to minimize duplication of effort
  • We build community through inclusive processes, collaborations, and federated alliances of repositories, institutions and members

Dated: March 23, 2019


Source URL: https://www.crl.edu/proposals

Links
[1] http://docs.google.com/forms/d/e/1FAIpQLSeiU6dSwKexKGtvoHwsz9HNlcw78KBSCzM08fhXsu57S8nOuQ/viewform
[2] mailto:nagrawal@crl.edu?subject=South%20Asia%20Open%20Archives%20(SAOA)
[3] https://brill.com/fileasset/downloads_products/31800_Guide.pdf
[4] https://www.lib.uchicago.edu/e/su/southasia/off-1984.html#Heading2
[5] http://dsal.uchicago.edu/bibliographic/unionlist/unionlist.php
[6] http://dsal.uchicago.edu/bibliographic/nbil/aboutmipp.html
[7] https://catalog.hathitrust.org/Record/007547563
[8] https://www.crl.edu/selection-principles