Collection Development Policy

 

Executive Summary

The mission of the Biodiversity Heritage Library is to improve research methodology by collaboratively making biodiversity literature openly available to the world as part of a global biodiversity community. BHL’s Collection Development Policy is integral to the implementation of BHL’s Strategic Plan 2020-2025 Goal 1 (Relevant Content) and future strategic directions. The high level description of Goal 1 is to: 

“Grow BHL into the most comprehensive, reliable, reputable repository of data-rich biodiversity literature, and other original materials, to support a response to global challenges.” 

More details can be found in the  BHL Strategic Plan 2020-25. While the policy may be reviewed annually, major modifications to the policy will coincide with the strategic planning cycle.

The primary audience for this Policy includes BHL consortium Members and Affiliates and partner institutions, as well as users of the BHL collection. This policy has been developed and revised through the years by the BHL Collections Committee.

As a foremost digital library of natural science publications and archival materials, BHL is a free and open access online resource that primarily reflects the print collections of its contributors. In making content digitally accessible through www.biodiversitylibrary.org, BHL collection management priorities are to increase: 

  1. Access – make full-text content directly accessible within the BHL repository; 
  2. Discovery – enable content discovery by providing metadata and BHL tools, such as search and scientific name finding;
  3. Usability – improve content usability, both by humans and computers, through enhancement of existing content, e.g. transcriptions of handwritten content, enrichment of bibliographic metadata, refinement of page-level data, and the assignment of persistent identifiers, etc.;
  4. Breadth – ensure the most complete access to content, e.g. correcting missing or blurry pages, digitizing materials to fill gaps in series, etc. 

BHL’s role as an active disseminator of biodiversity information means that BHL is responsible for what is collected and how these materials are shared with a global audience. In rethinking the collection development policy, BHL is committed to: 

    • Bringing biodiversity data points to the surface and exposing them for reuse by other science and technology projects.
    • Understanding the biases embedded within BHL’s collection and achieving more complete and inclusive representation of the available knowledge of and perspectives on biodiversity, e.g. including Indigenous Knowledge(IK) (1)
    • Maintaining free and open access to the historical record of scientific knowledge as well as its moral and ethical impacts, and how these have changed over time. 
  • Facilitating reparative cataloging and description practices to address records that are historically inaccurate, offensive, discriminatory, or harmful. 
  • Providing support for developing thematic sub-collections (title or volume lists) that advance principles of inclusion and equity.  
  • Encouraging partner institutions to diversify their collections for inclusion in BHL.

What is “Biodiversity”?

The BHL collection focuses on open access materials most relevant to the study of biodiversity. The term “biodiversity” refers to, “the variability among living organisms from all sources including, inter alia, terrestrial, marine and other aquatic ecosystems and the ecological complexes of which they are part; this includes diversity within species, between species and of ecosystems.” (²) Thus, BHL includes all levels of organismic organization, from genes to ecosystems, as well as other disciplines affecting the study of life on Earth.

Plate 16 from Die Insekten, Tausendfüssler und Spinnen 1877
Plate 16. Taschenberg, E.L. Die Insekten, Tausendfüssler und Spinnen. 1877. Contributed in BHL from Cornell University Library.

Collection Definition

The Biodiversity Heritage Library (BHL) collection is the world’s most comprehensive digital collection of open access biodiversity literature and archival materials, dating from the 15th Century to the present. The majority of the BHL collection contains legacy literature in the public domain. Current literature, still under copyright protection, is included as permissions are granted. BHL continually strives to eliminate barriers to  access and dissemination of materials that were otherwise restricted to a particular audience. In building a consortium of libraries under a shared vision of open science, BHL makes its collection freely accessible to a global audience. 

BHL repatriates information by ensuring broad access, reuse, and the ability to comment on content in accordance with the Convention on Biological Diversity Article 17. All materials in BHL are free to access, download, reuse, and repurpose under the principles of open access and open data. BHL aspires to achieve FAIR (findable, accessible, interoperable and reusable) principles. In addition to FAIR, BHL plans to develop relationships to address CARE principles for Indigenous data governance to incorporate people and purpose in the advocacy of open data. BHL provides access to biodiversity literature for areas with high biodiversity but little access to the published literature. For examples of BHL data reuse, see: Encyclopedia of Life (EOL) , World Register of Marine Species (WoRMS), International Plant Name Index (IPNI), Tropicos, BioStor-Lite, and Global Biodiversity Information Facility (GBIF).

BHL Partner Collections

This collection development policy applies to the digital BHL collection available at https://biodiversitylibrary.org. BHL partners may look to this policy as a guide in developing their own collection development practices. BHL provides training and support to its partners for contributing materials to its unique, open source digital repository. Partners are responsible for providing materials that meet BHL’s minimum imaging and book-based metadata standards. In some cases, partners may need to provide extensive input where materials need special digitization, cataloging, or curation attention. 

BHL recommends that duplication of digitization be minimized across partner institutions where possible. BHL has no digital rights management (DRM) mechanisms in place and does not intend to adopt any DRM mechanisms in accordance with BHL’s open access principles. 

BHL partners that contribute content to the collection are expected to: 

  • Digitize materials according to BHL’s minimum imaging standards.
  • Make their content and metadata available for free and open access. 
  • Provide metadata for digitized materials, under a CC0 license, according to BHL’s metadata requirements.
  • Upload digitized materials to the Internet Archive (IA), as part of IA’s “biodiversity” collection, which is used as the BHL’s staging platform.
    • Materials may be digitized and uploaded directly via IA.
    • Materials may be digitized outside of IA and uploaded using the Smithsonian Libraries and Archives “Macaw” software system. 
  • Curate metadata for materials in BHL’s collection as requested.
  • Address image quality errors, such as blurry or missing pages, for materials as requested, for example:
    • correct page image errors via re-digitization;
    • provide notations in the record about the issue; and/or 
    • submit scan requests for other institutions to re-digitize materials.

BHL harvests metadata into its repository and displays materials via its website (https://biodiversitylibrary.org). Content included in BHL through the Internet Archive “biodiversity” collection will be maintained for free and open access in perpetuity regardless of BHL consortium partnership status. As owners and maintainers of the IA biodiversity collection, BHL commits to stewarding these items on behalf of BHL contributors.

Acknowledgement of Harmful Content

BHL promotes and practices a commitment to open access and prohibition of censorship. As an initial effort to address damaging systematic biases inherent in BHL content, BHL’s Collections Committee developed an “Acknowledgment of Harmful Content” to recognize the harmful perspectives and sentiments expressed throughout biodiversity research. As stated in the BHL FAQ, more access to information, even if it contains harmful content, is critical to understanding the historical context in which knowledge has been created. Access to historical documents and records helps researchers gain a more complete understanding of scientific knowledge, including its moral and ethical impacts, and how they have changed over time. Removing access to harmful content does not necessarily reduce harm. By exposing the perpetrators of harmful ideas, the BHL collection documents evidence of the biases and prejudices that perpetuate to this day as barriers to equitable knowledge creation and dissemination. Thus, BHL does not remove materials with harmful content from the collection. For other reasons, materials may be removed in accordance with the Deaccession policy.  

BHL is grateful that many institutions have begun to address harmful biases in metadata terms and will support such efforts to improve metadata on a case by case basis to ensure inclusivity. (See University of Houston Libraries,  Carbajal 2021 (³),  Harvard University Center of Medicine, 2022 (4)National Archives 2021). BHL aims to collect, curate, and provide to the world a diversity of perspectives related to biodiversity knowledge.   

Scope

Biodiversity-Relevant Content

At its core, BHL is focused on the subject matter relevant to the work of zoologists, botanists, evolutionary biologists, taxonomists, systematists, ecologists, natural history collections managers, scientific illustrators, biological science historiographers, and amateur scientists & hobbyists. In building BHL’s collection at its inception, subject matter were selected to meet these core requirements. In the case of early taxonomic literature, core content was, at times, published in journals of a broader scope than is the practice today. Rather than review journal volumes piece by piece and scan items in isolation, BHL seeks to provide the complete set of volumes for any given title. Please see Appendix B for more information.

To serve a wide, interdisciplinary audience, the boundaries of BHL’s collection are deliberately inclusive. As E.O. Wilson once stated, “Biologists are inclined to agree that it [biodiversity] is in one sense everything.” (5) The full breadth of biodiversity information is not necessarily available in published form. Other biodiversity knowledge is available through various formats and subject matter. Content that extends the context of taxonomic zoological and botanical literature, such as agriculture, microbiology, and earth science are important to the scope of BHL. As a curated digital collection aggregating content from hundreds of providers into a single platform, BHL is a reflection of the historical collection development decisions of those providers as well as the publishing practices and historical colonial processes that have shaped the scholarly record of biodiversity science. By seeking content that includes more diverse perspectives from people of color and Indigenous Knowledge (IK) for example, BHL is hoping to promote equity in science and increase access to critical aspects of biodiversity knowledge.

Figure 1 represents various subject matter relevant to BHL’s collection, derived and analyzed from subject keywords present as of September 2022. The visualization is intended to illustrate BHL’s collection at a glance and is not exhaustive.

Visual representation of selected biodiversity relevant subject terms
Fig.1 A snapshot of BHL relevant subject keywords as derived from its collection September 2022

Content that falls outside biodiversity relevant subject matter, such as child psychology and sports, may be removed according to the Deaccession policy. BHL also incorporates biodiversity relevant, open access content from Internet Archive’s text corpus to supplement the collection and enrich the range of content available for use through BHL taxonomic name finding and data export services. For more information about how BHL incorporates content from other digitization projects, please see Collection of Materials and Appendix C.

Date Range of Content

The BHL collection spans centuries of material dating back to the late 15th Century and includes material up to the present. Figure 2 illustrates the date range of content for BHL collection books or volumes (also known as “items”). 

Book/Volume count per date range
Fig.2 BHL collection items per 50 year date range as of December 2021

Copyright

The bulk of BHL’s collection is in the public domain in the United States. Due to the annual rolling copyright term in the United States, publications are in the public domain if they are published prior to the 95 year copyright cut-off, e.g. before 1927 for 2022, before 1928 for 2023, and so on. 

U.S. Public Domain = works published BEFORE [Current Year] – 95

In addition to materials in the public domain, BHL also contains in-copyright materials where express permission has been provided by the copyright holder(s). Additional materials are available for open access because the copyright has not been renewed. BHL has been adding unpublished materials such as field notes, recognizing that unpublished materials are available in the United States 70 years after the author’s death or with permission. 

Copyright laws differ country by country depending on a variety of factors such as the country of publication. Where U.S. copyright law does not apply, BHL’s international consortium of partners follow the copyright laws of their home countries. For more details about copyright, see BHL’s Copyright and Reuse page. If there are concerns about the copyright status of a work in BHL’s collection, please refer to the Take-Down guidelines included as part of the Deaccession policy

Materials that are not in the public domain are available in BHL for one or more of the following reasons:

  1. Permission has been granted to provide the content online from the copyright holder. 
  2. Copyright has expired or was not renewed, according to the Stanford University Copyright Renewal Database , the Catalog of Copyright Entries, and the U.S. Copyright Catalog.
  3. Open access to the material has been provided by contributors to the Internet Archive (IA) repository. These materials have been selected against biodiversity relevant criteria for inclusion into BHL. They are labeled under the generic status, “Not provided. Contact Holding Institution to verify copyright status” and require further investigation.

For more information about United States’ public domain, please review this helpful chart, Copyright at Cornell Libraries: Copyright Term and the Public Domain. For a brief overview of the copyright terms across different countries, please see Wikipedia’s List of countries’ copyright lengths. A comprehensive review of copyright laws (or intellectual property (IP) laws) by country is available via the World Intellectual Property Organization (WIPO) Lex Database https://wipolex.wipo.int/en/legislation/members

Types of Materials

BHL is committed to providing free and open access to faithful reproductions of original works. As part of its goal to grow a “reliable” collection, BHL ensures that the digital versions of materials contributed by consortium partners reflect the physical copies on their shelves. 

Digital surrogates are created from the physical copies and made available through BHL’s website such that surrogates can be read online and downloaded in a convenient format. Page orientations, fold outs, text that spans gutters, and other publication irregularities may require minor adjustments in the way materials are presented within the book viewing software. As much as possible, BHL will provide complete copies of works or identify incomplete copies, to ensure that users of the collection have high quality access to resources.

Books and Journals

BHL provides access to a range of scholarly and general science materials in the form of published books and journal volumes. BHL makes its best efforts to present series of related volumes together, such as those related to a journal or monographic series, under a single title record. As book or volume availability and cataloging practices differ across institutions, multiple copies of the same content may exist under different title records. For example, the same work is cataloged in 2 ways:

In addition to book and journal volumes, BHL includes theses and dissertations, conference proceedings, magazines, seed and nursery catalogs, as well as other published biodiversity science materials. For journal titles, some volumes may be created by associating multiple born-digital articles together to form a virtual volume. Articles virtually aggregated as volumes come from a single source. Volumes are not created by blending articles from multiple sources.

Books-like Objects

The current infrastructure of BHL’s database supports a specific format of materials that can be described as “book-like objects”: an object with a series of chronological “pages” that can be viewed and interacted with through the book-viewer on BHL’s website. The BHL database may accommodate materials that are not traditionally thought of as books, volumes, or articles so long as they conform to the “book-like” format; archival materials are a good example. Accommodating access to non-book-like materials, such as artworks and maps, are future goals.  

Archival Materials

There are a variety of unpublished materials in BHL such as field notebooks, correspondence, collection records, and other items. Many grant projects have supported the specialized digitization and cataloging work required to process these materials for inclusion. Some notable examples include: 

Unpublished materials are essential primary resources that enhance the unique value of BHL’s collection. Partners are encouraged to contribute archival materials where possible. Please note that archival materials must meet BHL’s basic image quality, metadata, and book-like format requirements.

Articles

Content in BHL’s collection consists primarily of books and journals, but a growing number of articles are appearing in the collection as a result of article metadata contributions from BHL Partners and BioStor.org (http://biostor.org). As of January 2022, there are over 320,000 articles indexed and searchable in BHL’s collection. At this time, articles are accepted in BHL in the form of article metadata only. Uploading individual articles into the repository is discouraged unless they are included in the form of virtual volumes as described above. Full volumes of a journal are expected as item uploads.  The articles within the volumes can be indexed and made searchable by submitting article metadata to BHL.

Other Formats

Microform

BHL contains microform materials in its collection largely ingested from Canadiana.org. While poor in scanning quality, microform may provide the only available copy of content for access. Thus microform will remain in BHL’s collection and continue to be ingested from the Internet Archive. Where possible, microform copies that duplicate higher quality digitized materials will be deprecated and redirected. 

Transcriptions

Digitizing archival collections and making them available online is only the first step toward making the content fully accessible. To enable usability of the content, such as searching and computation on the full-text, it is necessary to transcribe handwritten material. BHL has supported citizen science and volunteer transcription projects to replace unreadable Optical Character Recognized (OCR) text in BHL with transcribed text that can be searched as full-text, for example a journal from William Brewster. Because the handwritten page was transcribed and used in place of the uncorrected OCR in BHL, the text can be searched and the taxonomic name(s) extracted. Various citizen science projects have provided transcription using BHL materials although not necessarily re-purposed for BHL (see this example on Zooniverse).

In 2011, BHL partnered with Cambridge University and the Natural History Museum, London to present the Darwin’s Library project. This collection of books from Darwin’s personal library includes transcribed annotations from written comments in his books.   

Some transcriptions are available in BHL as separate volumes that accompany their digitized counterparts, for example a transcribed diary from David Crockett Graham. These dedicated transcription volumes provide full-text search capability and view-ability.

Exclusions

Frankenbooks

A digital item in BHL is composed of the pages of a single physical item and not the aggregate of pages digitized from multiple like items (an aggregate is known as a “frankenbook”). For example, in the case of a volume digitized where the physical copy is missing pages, BHL will not insert digitized pages from a different physical copy. It is possible, however, that materials may be ingested from other libraries’ scanned collections in the Internet Archive that do not follow this “no-frankenbook” policy.

Google Books Content

BHL does not ingest content scanned as part of Google Books. In many cases Google Books image quality does not meet BHL minimum imaging requirements. There are also legal concerns regarding the incorporation of Google Books content. BHL consortium partners that have contributed to Google Books may be able to provide additional information.

Other Content

Externally Hosted

Selected materials from third-party websites have been indexed within BHL as part of an experiment to aggregate more biodiversity content under a single point of access. In these cases, BHL does not include the full text within its repository, but links out to selected content in external, trusted repositories. The experiment demonstrated the need to prioritize the deposit of  full-text content that can be served directly through BHL’s website http://biodiversitylibrary.org. The tools and services that make BHL unique, such as taxonomic name finding services, can only be used if the content is served directly through BHL’s repository. Going forward, BHL will not index externally hosted content within its repository. 

Existing links to content on third-party websites in BHL are exceptions to this policy, for example, see content from BHL SciELO http://biodiversitylibrary.org/part/121108. In this case, BHL indexes the metadata for the BHL SciELO article within its repository but the full text of the article is served through https://www.scielo.br/. Any content served through an external third-party will be freely available under the same open access principles that govern BHL’s collection. 

Selected materials from the following providers are available as externally hosted content as of Jul 29, 2022

  • American Museum of Natural History Library Digital Repository
  • Biblioteca Digital del Real Jardín Botánico de Madrid
  • CiteBank 
  • Pensoft Publishers
  • SciELO – Scientific Electronic Library Online

Any externally hosted content indexed in BHL as metadata only, that links to content restricted from public view or broken without opportunity for repair, will be removed from the collection as they are identified.

flickr

Beginning in 2011, BHL began uploading illustrations from items in its collection to a flickr photostream. To date, there are over 318,000 full page plates and images that can be searched and downloaded using flickr’s website. Images in BHL’s flickr photostream have been selected, uploaded, and curated by BHL Partners and are thus a subset of the full page plates and images within BHL’s overall collection. Each BHL flickr album contains the complete set of full page plates and images from a single book or volume. Each image uploaded to flickr contains a link back to the item in BHL. 

The BHL flickr collection provides opportunities for users to browse a gallery of images and search for specific images based on metadata tags, or keywords, contributed by BHL Partners and registered flickr users. Images can be searched by topic, common name, and/or taxonomic name for example. Please see BHL’s FAQ about searching flickr for more information. Many images are tagged with information from BHL catalog records, and often feature image-specific terms including species and illustrator names. 

Most images in the Biodiversity Heritage Library flickr collection are in the public domain, and can be downloaded, shared, re-used, or transformed. Please refer to the copyright statement in flickr, or the corresponding item in BHL, to determine the copyright status of the item containing the image in question. Proper attribution is a critical component of ethical re-use, even if a work is no longer protected by copyright. In addition to citing the work, attribution to the Biodiversity Heritage Library and the library that supplied the digitized item makes it possible for other researchers to find BHL resources, and helps BHL understand and demonstrate the impact of its digital collections.

Collection of Materials

BHL seeks to provide the most comprehensive collection of legacy botanical and zoological taxonomic literature possible while pursuing permission for in-copyright materials. The foundation of the collection is based on the collections of the library consortium that forms BHL. See the BHL Consortium page for a list of participating institutions. A concerted effort has been made to provide all public domain content held within the general collections of BHL consortium libraries so long as the materials fall within the scope of biodiversity relevant subject matter. As part of BHL’s strategic plan to achieve more complete and inclusive representation of the available knowledge of and perspectives on biodiversity, and expand its collection, it is actively seeking materials from underrepresented contributors as well as Indigenous Knowledge.

Content held within special collections or rare book collections is made available when possible. Since many partners are reluctant to ship rare materials to a scanning facility for digitization, those libraries that have in-house scanners or trusted outside scanning partners are able to contribute such materials to BHL’s collection. Often, the condition, size, or physical location of rare materials precludes the ability to make these materials available in digital form.

In addition to content contributed from BHL consortium libraries, the collection is supplemented by:

  • Open access biodiversity relevant materials, already in digital form, as made available by other digital library projects and scanning partners, such as the Internet Archive.
  • In-copyright titles for which permission has been obtained from the rights holder.
  • Requests submitted through BHL’s scan request form.

BHL works on the premise that some access is better than no access. Some digitized materials fall short of BHL minimum metadata and image quality standards. However, if the materials are unique, BHL will proceed with harvesting the material.

Partner-driven selection

BHL is a consortium of libraries and other institutions, with collections that contain a wide range of natural history subject matter. At the project’s inception, BHL consortium libraries selected content for scanning based on collection strengths. Institutions were assigned a scanning concentration based on a specific discipline, such as Entomology, Ichthyology, or Mammalogy, etc., or a set of materials, such as natural history periodicals. The Collections Committee has reviewed journals available in consortium libraries and developed an initial list of works that contain content of high taxonomic value for Botany & Zoology.

At present, BHL digitization has progressed beyond the initial disciplinary approach to adopt a more targeted partner-driven approach. Where time and resources allow, partners may select content for BHL in order to:

  • Provide access to unique, rare, or special collections. 
  • Improve access to collections that are produced by or related to underserved and underrepresented communities.
  • Fulfill institutional or grant project deliverables.
  • Correct missing pages or image quality errors. 
  • Address missing volumes or pieces of a title to complete series coverage.
  • Add materials that enter into the public domain after their copyright term has expired, such as in the case of U.S. publications published before the current year less 95 years. Please see Appendix D for more details.

In-copyright materials with permission

BHL receives permission from rights holders to digitize or upload in-copyright material. BHL consortium Partners are encouraged to prioritize the inclusion of materials where permission has been granted by the rights holder over other materials in the queue. BHL is focused on providing open access to in-copyright works that are otherwise unavailable online. By prioritizing the inclusion of these in-copyright materials, BHL is helping to connect smaller scholarly society publications into the world of digital scholarship. As the electronic publication sector continues to grow, BHL is assessing the scope of including born-digital in-copyright materials in its collection. Please see more information about granting or obtaining permission to include in-copyright materials in the BHL collection through our public FAQ – Can I contribute content to the BHL collection?

Passive Ingest of non-BHL partner content from Internet Archive

BHL supplements its collection by harvesting (ingesting) open access content from the corpus of digitized books available through the Internet Archive to acquire materials otherwise unavailable within the consortium of BHL institutions. Thus, selected materials scanned by the University of California Libraries and the Wellcome Library, for example, can be found in BHL’s collection. These materials are identified with the description, “(archive.org)” as part of the Holding Institution data field.

Content ingested from non-BHL consortium libraries must conform to a predetermined set of criteria where Library of Congress Subject Headings (LCSH) and Library of Congress and Dewey Decimal call numbers are used to automatically identify biodiversity relevant materials. The initial ingest  criteria were identified and refined by BHL’s Collections Committee. To learn more about the ingest criteria, please see Appendix C

As long as ingested volumes are biodiversity-relevant, good-quality scans, or highly unique, they can be kept in BHL’s collection. Poor quality ingested copies can be removed from the collection. Where possible, it is strongly recommended that re-directs are provided to a higher quality replacement copy. Where duplicate titles are ingested, they may be merged automatically with existing titles in the collection (based on OCLC number), or may require a manual merge to cut down on duplicate hits in search results. Different editions of titles will not be merged.

The process of matching titles in the Internet Archive corpus against a fixed set of selection criteria based on subject headings and call numbers is imperfect and, occasionally, non-relevant materials may be inadvertently incorporated into the collection. As irrelevant items are identified, they will be processed according to the Deaccession Policy.

A caveat with material ingested into BHL according to this method is that BHL has a limited ability to resolve issues that may be discovered in the records of these items. While quality review is performed on a statistical sampling of items scanned from BHL consortium libraries, BHL has no control over the quality of items scanned by non-BHL partners. Issues can only be resolved if a BHL partner owns the item and can send it for digitization. If error correction results in duplication, the Internet Archive-contributed item will be removed in favor of BHL’s copy. 

User-requested material

BHL maintains a list of user-submitted requests for materials to be added to the collection. These materials are included at the discretion of BHL consortium partners. To submit a request to the list, please use BHL’s scan request form. These requests are processed as time and resources allow. Unfortunately, BHL is unable to fulfill requests for materials that are not held within its consortium Partner holdings.

Requirements

Materials in BHL’s collection should be copyright compliant, meaning that materials selected for inclusion must fall within the public domain, be reviewed under a due diligence process for copyright determination, or be permitted for inclusion by the express agreement of the rights holder. In addition to open access requirements, BHL adheres to the following metadata and digitization requirements. 

Metadata

MARCXML records are required to fully implement materials in BHL. BHL will only accept content without MARCXML records in exceptional cases. The lack of MARCXML records inhibits the standard index and search capabilities users have come to expect of materials in BHL. Keep in mind:

1 MARCXML file (Title level metadata) + Item level metadata + Page level metadata
= 1 Digital book or volume

BHL harvests all metadata via the Internet Archive (IA) and delivers selected metadata elements through the BHL website (http://biodiversitylibrary.org). IA requires title level metadata to be submitted as a MARCXML record. Item level metadata, such as the “Holding Institution” and “Copyright Status” information, as well as page level metadata, including page side (recto/verso) and “Page 65” for example, are manually assigned for each book or volume. Please see BHL Metadata Requirements documentation for more information.

Digitization

Digitization for BHL means cover-to-cover imaging of an entire book or volume as true to the physical copy as possible. For specific information about contributing content to BHL’s collection through the recommended digitization workflows, please refer to Internet Archive’s Scanning Services and the Macaw User Guide.

Image Quality

Please see BHL’s Digital Imaging Specifications for details about image quality requirements. Digitized items that do not meet minimum imaging standards may be subject to removal in accordance with the Deaccession Policy.

Deduplication

BHL consortium partners make every effort to avoid duplicating the digitization of like materials among BHL consortium library holdings. Duplication may be deliberate in cases where different copies of a book have unique features, such as annotations. If duplication is accidental, items may remain in BHL until such time as proper review and deaccessioning takes place. Duplication can also occur when items are ingested from other collections. Just as physical collections contain multiple copies of the same publication, the same is true for BHL.

Re-digitizing content for BHL repository

Periodically requests are received to re-digitize content that is already digitally available. This content may either be: 1) already in BHL’s repository scanned by a BHL participating institution 2) already in BHL’s repository as a result of passive ingest from the Internet Archive (scanned by a non-BHL library) (see passive IA “ingest”) 3) metadata indexed in BHL’s repository with links to vetted third-party websites (see Externally Hosted content) or 4) inaccessible via BHL’s repository but digitally available elsewhere (Google Books or HathiTrust scans for example). The decision as to how to respond to these requests is handled on a case by case basis. BHL recommends re-digitizing content in the following order of importance:

  1. To correct a poor quality copy of a work that is already available in BHL’s repository;
  2. To make available a work within BHL that is only available via an outside repository such as Google or HathiTrust, see also Google Content
  3. To incorporate content in BHL’s repository that is only available as indexed metadata linking out to a third-party website. 

Considerations:

  • The subject matter of the title should relate to the subject areas defined in the scope of biodiversity relevant content.
  • Appropriate quality, functionality, and sustainability of available content.
  • Reasonable cost of digitization.

BHL partners determine their capacity to re-digitize items at their discretion. 

 

Curation

As a mass digitization project with over 60 million pages of digitized content available to date, it is not possible to review 100% of the collection for potential errors. In order to fulfill its commitment as “the most comprehensive, reliable, reputable repository of data-rich biodiversity literature, and other original materials, to support a response to global challenges” BHL collects input from a variety of stakeholders. BHL relies on input from partners and staff, users of BHL’s collection and services, and colleagues within the biodiversity community to identify and prioritize curation activities, including: 

  • metadata errors 
  • improving cataloging and descriptive workflows to incorporate inclusive language
  • missing or blurred pages
  • volumes occurring out of order
  • duplication of author data
  • bugs or technical issues with the website
  • reconciling excessive duplication of materials

Despite best efforts to reject copies with missing content prior to digitization, incomplete copies may exist. In these cases, BHL will prioritize the fidelity of the original. To achieve complete digital copies, BHL will correct the digitization error or digitize a different complete copy of the original work held by another BHL partner. See the guidance on prohibiting frankenbooks for more details.

BHL works on the premise that some access is better than no access. Some digitized materials may fall short of BHL minimum metadata and image quality standards. However, if the materials are unique, harvesting of these materials will proceed even if standards are not met.

Direct curation of non-partner items ingested automatically from the Internet Archive is not possible because BHL does not have privileges to modify records outside the “biodiversity” collection in the Internet Archive. 

BHL partners have the option of developing curated collections (subcollections)  within BHL (See: https://www.biodiversitylibrary.org/browse/collections). Sub-collection landing pages can help provide context or background for the collection (for example:  A History of Cats). Sub-collections exist in BHL as lists of items or titles only; items and titles cannot be mixed. Article sub-collections are not available at this time. 

BHL Partners commit for as long as they can to retain print access to items uploaded to BHL’s collection. For circumstances describing when items may be removed from BHL, see the Deaccession Policy below.

BHL has a globally distributed network of stakeholders and users that rely on free and open access to BHL, anytime and anywhere they have Internet access. Users help inform the curation process and contribute to the improvement of the digital library and its services. 

Deaccession

The collection of the Biodiversity Heritage Library is centered around the legacy literature of taxonomy and systematics, and the literature that supports the interdisciplinary study of biodiversity. While BHL makes every effort to incorporate materials relevant to the broad scope of biodiversity, the nature of BHL as a mass digitization project does not allow review of every item that enters into BHL’s collection. Although most of the items in BHL are digitized from the print collections of BHL consortium libraries, other content is harvested from the corpus of open access materials contributed to the Internet Archive by libraries across the globe.

The collections of each BHL consortium partner are considered relevant in that they are representative of the core scholarly biodiversity literature. Content ingested into BHL from the Internet Archive or other open access databases may be removed if it is found to be irrelevant to the spectrum of scholarly biodiversity literature. Content will be removed contingent upon the approval of BHL’s Collections Committee. If you feel that an item in BHL is not relevant, submit comments to the feedback form

Content may be temporarily removed from BHL if the scan quality is so poor as to render the digital content illegible within a reasonable zoom level. Items digitized from the collections of BHL partners will be submitted to the scanning queue and replaced to the best of our ability.

Take-Down

BHL makes every effort to provide content within its collection that is freely and openly available for access and responsible reuse under the public domain or a Creative Commons license. In-copyright materials are in BHL’s collection with the express permission of the rights holder. It is general best practice for cultural and natural heritage institutions to adhere to the copyright laws of their home country. 

Content may be removed if it is found to be in violation of copyright. Should a copyright holder make BHL aware of a potential infringement of copyright, BHL will confirm the claim and remove the content from its websites (including the “biodiversity” collection in the Internet Archive). Copyright issues from works ingested from the Internet Archive corpus that are digitized by libraries other than BHL consortium partners are the responsibility of the “Holding Institution” and not BHL.

Acknowledgments

The BHL collection development policy was authored by the BHL Collections Committee with contributions from across the BHL consortium. Special thanks goes to partners and staff that volunteer their time to serve on BHL working groups including the Technical Team, Cataloging Group, Persistent Identifier Working Group and Executive Committee.

The revision of BHL’s collection development policy would not have been possible without the strategic vision, passion, and expertise of Connie Rinaldo (1955-2022). As an instrumental member of the BHL Collections Committee, she shaped this policy from the ground up. Her wisdom clarified our thinking; her guidance improved our voice. We are indelibly grateful for the energy and commitment she gave to our committee, this policy, and the Biodiversity Heritage Library.

Appendix A Collections Committee Roles and Responsibilities

The Collections Committee is built on a voluntary basis and includes representatives from BHL partner institutions with collection development and metadata expertise. The committee works closely with BHL administrative, technical development, outreach, and cataloging working groups to achieve BHL’s strategic goal #1: 

Goal 1 (Relevant Content)

Grow BHL into the most comprehensive, reliable, reputable repository of data-rich biodiversity literature, and other original materials to support response to global challenges. See objectives detailed in the BHL Strategic Plan.

Committee Charge  

The Collections Committee is responsible for the management and development of BHL’s collection including all issues related to the selection, prioritization, acquisition, curation and deaccessioning of content, as well as supporting BHL outreach activities relevant to collection development issues. The Committee may oversee the reuse or re-packaging of selected content, or a subset of the collection, in creative and novel ways. The Committee is open to all BHL Partners.

The Collections Committee supports the Collection Manager and serves as a final check on collection issues and decisions. Committee members are an invaluable resource for reviewing and addressing concerns about harmful content on BHL websites. 

Appendix B Initial Collection Boundaries

This representative subset of BHL subject areas shows core and supporting subject matter as developed by BHL’s nascent Collections Committee in 2010.

BHL Initial Collection Boundaries
Original BHL collection boundaries as determined in 2010.

Appendix C Internet Archive Ingest Criteria

The following Library of Congress Subject Heading (LCSH) terms and LC/Dewey Decimal call number ranges are used as criteria to identify items within the Internet Archive's "texts" corpus for passive ingest into BHL’s collection. Internet Archive items must have MARCXML in order to match against criteria and be ingested.
LCSH Term 650a 2nd ind. 0,3 Letters: A-ZLC Call # 050, 090 (local)Dewey Call # 082, 092 (local)Exclude Title based on LCSH Term
Adaptation (Biology)QE 700-999 (Paleontology)508 Natural historyAeronautics
Agricultural pestsQH (natural history)560 Paleontology PaleozoologyAnatomy, Human
AlcyonariaQK (botany)561 PaleobotanyBible and Science
AlgaeQL (zoology)562 Fossil invertebratesBookkeeping
Algae, FossilSB (plant culture)563 Fossil primitive phylaCarriages and carts
AlligatorsSD (forestry)564 Fossil Mollusca & MolluscoideaCavalry
AmphibiansSF (animal culture)565 Other fossil invertebratesChild development
Amphibians, FossilSH (fisheries and related)566 Fossil Vertebrata (Fossil Craniata)Child psychology
Amphipoda567 Fossil cold-blooded vertebratesChristianity
Anatomy, Comparative568 Fossil Aves (Fossil birds)Creation
Angiosperms569 Fossil MammaliaDouble taxation
Animal behavior573 Physical anthropologyDriving of horse-drawn vehicles
Animal ecology582 Spermatophyta (Seed-bearing plants)Educational psychology
Animals583 DicotyledonesGambling
Annelida584 MonocotyledonesGod
Anthozoa585 Gymnospermae (Pinophyta)Harnesses
Ants586 Cryptogamia (Seedless plants)Housing
Anura587 Pteridophyta (Vascular cryptograms)Human physiology
Apples588 BryophytaHygiene
Aquatic ecology589 Thallobionta & ProkaryotaeIncome tax
Aquatic insects592 InvertebratesInheritance and transfer tax
Arabis fecunda593 Protozoa, Echinodermata, related phylaMind & body
Arachnida594 Mollusca & MolluscoideaPeasantry
Arthropoda595 Other invertebratesPeople with mental disabilities
Bacteria596 Vertebrata (Craniata, Vertebrates)Poetry
Bacteriology597 Cold-blooded vertebrates FishesPolitical science
Basidiomycetes598 Aves (Birds)Polo
Batrachia599 Mammalia (Mammals)Psychological Tests
Bats638 Insect culturePsychophysiology
BearsRadioactivity
BeaversReal Estate Development
BeesReligion
BeetlesReligion and science
Beneficial insectsSocial ethics
Big brown batSoul
BirdsSpeeches, addresses, etc. American
Birds of preySports
Birds, FossilTaxation
Bivalves
Bivalves, Fossil
Bivalvia
Black bass
Boll weevil
Botanical chemistry
Botanical illustration
Botanists
Botany
Botany, Economic
Brachiopoda
Browntail moth
Bryozoa
Bryozoa, Fossil
Buprestidae
Butterflies
Cacao
Cactus
Carnivora
Caterpillars
Catfishes
Cats
Cephalopoda
Cephalopoda, Fossil
Cetacea
Chelonia (Genus)
Chlorophyll
Chondrichthyes
Chromosomes
Cicada (Genus)
Cirripedia
Cirsium longistylum
Cnidaria
Cockroaches
Collembola
Common garter snake
Compositae
Coniferae
Conifers
Copepoda
Coral snakes
Corals
Corals, Fossil
Crabs
Craniology
Crayfish
Crinoidea
Crinoidea, Fossil
Crocodiles
Crustacea
Crustacea, Fossil
Cryptogams
Ctenophora
Cumacea
Cyanobacteria
Cyperaceae
Cytology
Decapoda (Crustacea)
Deer hunting
Desert plants
Diatoms
Diptera
Dogs
Drosophila
Earthworms
Echinodermata
Echinodermata, Fossil
Elephants
Embryology
Endemic plants
Entomology
Environmental monitoring
Estuaries
Estuarine ecology
Eucalyptus
Euphorbiaceae
Evergreens
Falconry
Ferns
Fertilization of plants
Fish culture
Fish populations
Fish-culture
Fisheries
Fishes
Fishes, Fossil
Fishing
Fleas
Flies
Forage plants
Foraminifera
Forest insects
Forest reserves
Forests and forestry
Fragilariaceae
Freshwater animals
Freshwater biology
Freshwater fishes
Freshwater mussels
Frogs
Fruit
Fruit-culture
Fungi
Galls (Botany)
Game and game-birds
Gastropoda
Gastropoda, Fossil
Geometridae
Goldfish
Grasses
Grasshoppers
Growth (Plants)
Guinea pigs
Helminths
Hemiptera
Herbals
Herbaria
Herbs
Herpetology
Histeridae
Hoary bat
Holothurians
Homoptera
Horseflies
Horses
Hybridization
Hybridization, Vegetable
Hydromedusae
Hydrozoa
Hylidae
Hymenoptera
Ichneumonidae
Ichthyology
Infusoria
Insect pests
Insects
Insects, Fossil
Invertebrates
Invertebrates, Fossil
Isopoda
Jellyfishes
Karyokinesis
Larvae
Leaves
Lepidoptera
Lichens
Little brown bat
Liverworts
Lizards
Lobsters
Long-eared myotis
Long-legged myotis
Macaques
Malacostraca
Mammals
Mammals, Fossil
Marine algae
Marine animals
Marine biology
Marine ecology
Marine fishes
Marine invertebrates
Marine mammals
Marine plants
Marine pollution
Marsupials, Fossil
Materia medica, Vegetable
Medical parasitology
Medicinal plants
Mendel's law
Menhaden fisheries
Microorganisms
Mimicry (Biology)
Mites
Mollusks
Mollusks, Fossil
Morgan horse
Morphology
Morphology (Animals)
Mosquitoes
Mosses
Moths
Mountain plants
Mountain sheep
Mules
Muridae
Mushrooms
Mushrooms, Edible
Mushrooms, Poisonous
Mussels
Mycology
Myriapoda
Myxomycetes
Natural history
Natural selection
Nematoda
Nemertea
Neuroptera
Noctuidae
Nudibranchia
Oceanography
Odonata
Oenothera
Oligochaeta
Ophiuroidea
Orchids
Ornithologists
Ornithology
Orthoptera
Osteichthyes
Ostracoda
Oyster culture
Oysters
Pacific salmon fisheries
Paleobotany
Palms
Parasites
Parasitic plants
Parthenogenesis in animals
Pathogenic fungi
Penstemon lemhiensis
Phanerogams
Pheasants
Photosynthesis
Phycomycetes
Phylogeny
Physiology, Comparative
Phytogeography
Phytopathogenic fungi
Pigeons
Pigeons. [from old catalog]
Plankton
Plant anatomy
Plant cells and tissues
Plant communities
Plant diseases
Plant ecology
Plant hybridization
Plant introduction
Plant morphology
Plant names, Popular
Plant physiology
Plants
Plants, Cultivated
Plants, Edible
Plants, Flowering of
Plants, Fossil
Plants, Ornamental
Plants, Protection of
Plants, Useful
Platyhelminthes
Plecotus townsendii
Poisonous plants
Poisonous snakes
Polychaeta
Ponderosa pine
Primates
Protozoa
Protozoa, Pathogenic
Pselaphidae
Pteridophyta
Pulmonata
Pycnogonida
Quails
Rare plants
Rats
Regeneration (Biology)
Reproduction
Reptiles
Reptiles, Fossil
Rhizopoda
Rodents
Roots (Botany)
Rotifera
Rubiaceae
Rust fungi
Salamanders
Salmon
Salmon fisheries
Salmon fishing
Salmonidae
Sapphire rockcress
Sauropterygia
Scale insects
Scarabaeidae
Scientific Expeditions
Scleractinia
Scyphozoa
Sea anemones
Sea birds
Sea squirts
Sea urchins
Sea urchins, Fossil
Serpents
Sharks
Shellfish
Shells
Shrimps
Shrubs
Siboga Expedition
Silkworms
Silkworms. [from old catalog]
Silver-haired bat
Siphonophora
Skinks
Snails
Snakes
Soil microbiology
Sphingidae
Spiders
Sponges
Sponges, Fossil
Squamata
Staphylinidae
Starfishes
Stomatopoda
Stream ecology
Syagrus
Tenebrionidae
Thiaridae
Ticks
Toads
Tobacco
Trees
Trees. [from old catalog]
Trematoda
Trilobites
Tropical plants
Trout
Trout fishing
Tunicata
Turbellaria
Turtles
Turtles, Fossil
Type specimens (Natural history)
Unionidae
Variation (Biology)
Vegetation surveys
Venom
Vertebrates
Vertebrates, Fossil
Wasps
Water birds
Weeds
Western small-footed myotis
Wetlands
Whales
Wild flowers
Wildlife management
Woodlots
Woody plants
Worms
Zebras
Zoogeography
Zoological specimens
Zoology
Zoology, Economic
Zoos

Appendix D Best Practices for BHL Partners

Print Retention

BHL partners who provide digital content are encouraged but not mandated to follow these recommendations:

In general, following digitization, BHL’s recommended policy is for the Holding Library to retain the physical item that was digitized for the life of the digital surrogate in BHL. Should a physical item that has been digitized in BHL need to be deaccessioned by a partner library, the following provisions should go into effect: 

  • Inform BHL‘s Collections Committee that the print item is being withdrawn from the holding library. 
  • Add a note to the <copy-specific information> field in BHL that the print copy has been withdrawn from the holding institution (see example to the right)
  • If possible, offer other BHL Members the print title, including in-fill journal titles, and change the holding institution information in the record if they take the item.
  • If no BHL library will take the print item:
    • Ask a partner to digitize another copy that they own (and will keep).
    • Send the print title to Internet Archive, and place a note in BHL’s copy-specific record with information of where the print copy now resides.
Print retention example
**Image provided as an example only.** As print copies are removed from BHL Partner holdings, it is recommended that items are marked with Copy-specific information to indicate as such to BHL users.

In addition, it is BHL’s recommendation that print materials digitized for BHL should not be held in an off-site repository if the policy of that repository allows for it to withdraw copies without the owning library’s knowledge and consent.  For consortia libraries which scan and return copies to the originator library, it is BHL’s recommendation that they inform the originating library of BHL’s policy to retain the physical item.

For BHL partners who are withdrawing from the consortia or are closing, BHL’s Collections Committee should be informed as soon as possible so that the Committee can take appropriate steps to safeguard as many original BHL print titles as possible.

InterLibrary Loan (ILL)

Priority reciprocal lending to BHL partners is encouraged where possible as dictated by local practices and requirements. There is no explicit ILL agreement. 

BHL may only digitize content held within the collections of the member institutional consortium. Materials owned by libraries both within and outside BHL’s member consortium cannot be loaned to a BHL member institution and then deposited or scanned to BHL’s collection without permission.

Logistics (suggested) for informal ILL needs include:

  • Requests posted to the BHL Staff listserv
  • Requests through standard ILL channels with standard policies
  • Reciprocity/fee structure as a decision of the lender based on institutional policies
  • Wiki-page that provides information from each institution on institutional policies regarding fees, copyright, scanning limits and mailing limits

Page redaction

The purpose of the Page Redaction policy is to guide Partners who may wish to redact sensitive content present in the pages of cover-to-cover items they are digitizing. Any decision or action to redact content is the responsibility of BHL Partners, especially in the case of archival materials where sensitive information could be more likely to occur. BHL does not intend to prohibit sensitive information from its collection.

Rolling copyright term for public domain works

Due to the annual rolling copyright term in the United States, publications are in the public domain if they are published prior to the 95 year copyright cut-off, e.g. before 1927 for 2022, before 1928 for 2023, and so on. 

U.S. Public Domain = works published BEFORE [Current Year] – 95

BHL Partners are encouraged to contribute new items to the BHL collection that enter into the public domain at the start of each calendar year. Selection of public domain materials for digitization is at the discretion of each Partner and their institution’s resource capacity. Materials extant in the collection that have entered into the public domain will have their copyright status updated where possible.

Supplementary links to BHL titles

In some cases, linking from a title in BHL to an external web-based resource can help provide context or other supporting information that may benefit BHL users. The Supplementary Links to BHL Titles document provides information about and guidelines for making these contextual links. 

BHL supplementary links example
Example of a supplementary link from the “Rudolf Blaschka letters to Walter Deane” in the BHL collection to more information via Harvard University Botany Libraries.

Appendix E Research Bibliography

This is a link to a living research bibliography that supports the Biodiversity Heritage Library Collections Committee initiatives including understanding and actively implementing diversity and inclusion processes within BHL’s community.

https://www.zotero.org/groups/4860464/bhl_bibliography/library

—-

1 Indigenous Knowledge (IK) is a phrase used in the BHL Collection Development policy to describe, “a body of place-based knowledges accumulated and transmitted across generations within specific cultural contexts… [which] underpin human–environment relationships.” https://doi.org/10.1002/fee.2435. Accessed: August 30, 2022.

2 Convention on Biological Diversity. http://www.cbd.int/convention/articles/?a=cbd-02. Accessed: April 1, 2022.

3 Carbajal, Itza A. (2021, November 8). Historical metadata debt: Confronting colonial and racist legacies through a post-custodial metadata praxis. [Special issue on Unsettling the Archives.] Across the Disciplines, 18(1/2), 91-107. https://doi.org/10.37514/ATD-J.2021.18.1-2.08

4 The Archivist’s Task Force on Racism REPORT TO THE ARCHIVIST April 20, 2021. https://www.archives.gov/news/topics/recommendations-from-internal-task-force-on-racism. Accessed: April 1, 2022.

5 Reaka-Kudla, M.L., et al. (eds.). Biodiversity II : understanding and protecting our biological resources. Washington, D.C. : Joseph Henry Press, 1997.