Collection Development Policy

This policy addresses the BHL’s ability to achieve its Goal 1: (Relevant Content) Grow the BHL into the most comprehensive, reliable, reputable repository of data-rich biodiversity literature, and other original materials, to support a response to global challenges.


Collection Definition

The Biodiversity Heritage Library (BHL) collection is the world’s most comprehensive digital collection of the legacy literature of biodiversity. All materials in the BHL are free to access, download, reuse and repurpose under the principles of open access and open data. In the cases where the BHL provides access to in-copyright content, these materials are made available with permission under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license. By promoting an open digital collection of biodiversity literature, the BHL is:

  • making freely accessible to a global audience the content held within the collections of the BHL library consortium
  • liberating the taxonomic names and bibliographic data associated with the content for creative re-use by other technology projects.

The content of the BHL collection comes mainly from within our repository, but also includes links to content hosted through third party websites. For more information about the sources of content that supply the BHL collection, please see our Selection Process page.

What is “Biodiversity”?

EO Wilson quote
“Biologists are inclined to agree that it [biodiversity] is in one sense everything” Dr. E.O. Wilson

The BHL collection focuses on the open access materials most relevant to the study of biodiversity. The term “biodiversity” refers to, “the variability among living organisms from all sources including, inter alia, terrestrial, marine and other aquatic ecosystems and the ecological complexes of which they are part; this includes diversity within species, between species and of ecosystems” [1]. Thus, the BHL includes all levels of organismic organization, from genes to ecosystems, as well as other disciplines affecting the study of the biodiversity of life on Earth.



Biodiversity-Relevant Content

At its core, the BHL is focused on the subject matter relevant to the work of zoologists, botanists, evolutionary biologists, taxonomists, systematists, ecologists, natural history collections managers, scientific illustrators, biological science historiographers, and amateur scientists & hobbyists. This representative subset of BHL subject areas shows core and supporting subject matter. When deciding what to scan from our own collections, we try to maximize our scanning dollars by prioritizing the core literature, especially those titles that have a high concentration of taxonomic names. Content deemed to be irrelevant to biodiversity studies is subject to removal according to our Deaccession Policy.

Collection Boundaries

To serve a wide, interdisciplinary audience, the boundaries of the BHL collection are deliberately inclusive. As E.O. Wilson once stated, “Biologists are inclined to agree that it [biodiversity] is in one sense everything,”[2] which is why content at the periphery of the core taxonomic zoological and botanical literature is present in the collection. In the case of early taxonomic literature, core content was, at times, published in journals of a broader scope than is the practice today. Rather than review journal volumes piece by piece and scan items in isolation, the BHL seeks to provide the complete set of volumes for any given title. In addition, the BHL incorporates open access content scanned by other digital library projects to supplement its collection and enrich the range of content available for use through its taxonomic name finding and data export services. For more information about how the BHL incorporates content from other digital library projects, please see our Selection Process.

Date Range of Content

The bulk of material in the BHL is in the public domain in the United States, meaning that the publication date is prior to 1923. Materials published after 1922 are available in the BHL for one or more of the following reasons:

    1. We have received explicit permission to provide the content online from the copyright holder. Please see the list of titles for which we have obtained permission.
    2. U.S. Federal government publications are in the public domain.
    3. Works for which the copyright was not renewed, according to the Stanford University Copyright Renewal Database , the Catalog of Copyright Entries, and the U.S. Copyright Catalog.
    4. Works made available via open access repositories such as the Internet Archive .
      Should you have any concerns about the copyright status of a work in the BHL collection, please refer to this section of our Deaccession Policy.

Types of Materials

Books and Journals
The BHL provides access to a range of scholarly and general science materials in the form of published books and journals.

Article Metadata (not Articles)
Content in the BHL collection consists primarily of books and journals, but a growing number of articles are appearing in the collection as a result of article citation metadata contributions from BHL Partners and ( As of August 2018, there are over 264,000 articles indexed and searchable in the BHL collection. Articles are accepted in BHL in the form of article metadata only. Individual articles cannot be uploaded into the BHL repository. Full volumes of a journal must be uploaded instead. The articles within the volumes can be indexed and made searchable by submitting article citation metadata to BHL.

New article metadata is being indexed within the BHL collection on a regular basis. Try our Advanced Search options and select the “Articles/Chapters” tab to find articles. Additional article level access is possible by locating the journal title and navigating to the appropriate volume and page.

Archival Materials
There are a variety of unpublished materials in the BHL collection such as field notebooks, correspondence, collection records, etc. Many fantastic grant projects have supported the specialized digitization and cataloging work required to process these materials for inclusion. Some notable examples include the Smithsonian Field Book Project, Engelmann Online, Connecting Content and the Field Notes Project. Unpublished materials are essential primary resources that enhance the unique value of BHL’s collection. Partners are encouraged to contribute archival materials where possible. Please note that archival materials must meet BHL’s basic requirements for MARCXML metadata, sufficient image quality, and book-like objects (see below).

No Frankenbooks
A digital item in the BHL is composed of the pages of a single physical item and not the aggregate of digitized pages from various like items. For example, in the case of a volume digitized where the physical copy is missing pages, the BHL will not insert digitized pages from a different physical copy. It is possible, however, that we may ingest materials from other libraries scanning to Internet Archive that do not follow this “no-frankenbook” policy.

Book-like objects
The current infrastructure of the BHL database supports a specific format of materials that can be described as “book-like objects,” that is an object with a series of chronological “pages” that can be viewed and interacted with through the BHL website book-viewer. The BHL database may accommodate materials that are not traditionally thought of as books, volumes, or articles so long as they conform to the “book-like” format, see archival materials for example. Accommodating access to non-book-like materials, such as artworks, maps, specimens, and etc., is under consideration for the future.

Links to materials on third-party websites
Selected materials from third-party websites have been indexed within the BHL as part of an experiment to aggregate the world’s biodiversity content under a single point of access. In these cases, BHL does not include the full text within its repository, but links out to selected content in external trusted repositories. It is preferred that full-text content be deposited in BHL via the Internet Archive and served through the BHL’s website The tools and services that make BHL unique, such as taxonomic name finding services, can only be used if the content is served through the BHL site.


The BHL seeks to provide the most comprehensive collection of legacy botanical and zoological taxonomic literature possible. The foundation of the BHL collection is based on the collections of its library consortium. See our BHL Consortium page for a list of institutions that have joined. A concerted effort has been made to provide all public domain content held within the general collections of BHL consortium libraries so long as the materials fall within the scope of biodiversity relevant subject matter. Every effort is made to avoid duplicating the digitization of like materials among BHL consortium library holdings. Duplication may be deliberate in cases where different copies of a book each have unique features, such as annotations. If duplication is accidental, items may remain in BHL until such time as proper review and de-accessioning takes place.

Content held within special collections or rare book collections is made available when possible. As rare materials cannot be shipped to a scanning facility for digitization, only those libraries that have in-house scanners are able to contribute such materials to the BHL collection. Often, the condition, size, or physical location of rare materials precludes the ability to make these materials available in digital form.

In addition to content contributed from BHL consortium libraries, the BHL collection is supplemented by:

  • user-requested content accepted through the BHL feedback form
  • open access biodiversity relevant materials, already in digital form, as made available by other digital library projects and scanning partners, such as the Internet Archive.
  • in-copyright titles for which obtain permission to provide in our collection.

Selection Process

The BHL content selection process has four approaches. These include systematic discipline-specific selection, selection based on permissions granted by rights-holders, and selection based on user requests. A fourth source of content for the BHL collection comes from a passive harvest of non-BHL partner institution content found in the Internet Archive corpus. Materials in the BHL collection should be copyright compliant, meaning that materials selected for scanning must fall within the public domain, be reviewed under a due diligence process for copyright determination, or be permitted for inclusion by the express agreement of the copyright holder.

1) Discipline-specific selection
The BHL is a consortium of libraries, whose collections contain a wide range of natural history subject matter. At the project’s inception, BHL consortium libraries selected content for scanning based on their collection’s strengths. Institutions were assigned a scanning concentration based on a specific discipline, such as Entomology, Ichthyology, or Mammology, etc., or set of materials, such as natural history periodicals. At present, BHL scanning has progressed beyond the initial disciplinary approach to adopt a more targeted and patron-driven selection process.

The priority for BHL scanning going forward is to digitize the missing volumes or pieces of a title in order to provide the most complete set of materials possible.

2) In-copyright materials with permission
BHL receives permission from copyright holders to digitize in-copyright material. BHL consortium libraries prioritize the digitization of materials where permission has been granted by the copyright holder over all other materials in the queue. Visit our Permissions page to learn more about the permissions process, or to find information about granting permission for BHL to scan any material for which you are the rights holder.

3) User-requested material
The BHL selects a large majority of content for scanning based on user-submitted requests. On the BHL website, there is a feedback form which provides users with the opportunity to request items for scanning. These requests are processed in the order they are received and assigned to the BHL institution that owns the requested item. As it is often the case that the holdings of a single library cannot fulfill a user’s request to completion, many BHL partners must work together to send the materials for scanning. User requests are fulfilled barring any issues that may exist with the condition and/or size of the materials, copyright restrictions, rarity of the content requested, etc.

4) Passive Ingest of non-BHL partner content from Internet Archive
The BHL supplements its collection by harvesting, or “ingesting”, open access content from the corpus of digitized books available through the Internet Archive in order to acquire materials otherwise unavailable within the consortium of BHL institutions. Thus, selected materials scanned by the University of California Libraries and the Wellcome Library can be found in the BHL collection. These materials are identified with the description, “(” as part of the <Holding Institution> data field.

Content ingested from non-BHL consortium libraries must conform to a pre-determined set of criteria where Library of Congress Subject Headings (LCSH) and Library of Congress and Dewey Decimal call numbers are used to automatically identify biodiversity relevant materials. The initial criteria were identified and have since been refined by the BHL Collections Committee.

The process of matching titles in the Internet Archive corpus against a fixed set of selection criteria based on subject headings and call numbers is not perfect and, occasionally, non-relevant materials are inadvertently incorporated into the collection. As the BHL collection continues to grow, the Collections Committee meets regularly to discuss issues related to non-BHL partner ingested materials. If you notice an item in BHL that you believe is out of scope and should be removed, please let us know via our feedback form and the Collections Committee will consider the removal of the item. For more information on the deaccession of material from the BHL collection, please visit our Deaccession Policy.

A caveat with material ingested into BHL according to this method is that BHL has a limited ability to resolve issues that may be discovered with these items. While we perform quality review on a statistical sampling of items scanned from BHL consortium libraries, we have no control over the quality review of items scanned by non-BHL partners. Thus, there may be quality issues that we are unaware of. If you notice an issue with a non-BHL contributed or “(” item, such as a missing page or poor scanning quality, please let us know via our feedback form. In many instances, we will only be able to resolve the issue if a BHL partner owns the item and can send it for scanning. In this case, we will remove the Internet Archive-contributed item in favor of the BHL copy.

[1] Convention on Biological Diversity. Accessed: August 1, 2011.
[2] Reaka-Kudla, M.L., et al. (eds.). Biodiversity II : understanding and protecting our biological resources. Washington, D.C. : Joseph Henry Press, 1997.
Collection Development Slide Presentation