Developer and Data Tools

BHL provides data exports and APIs to allow individual users and data providers to download, remix and reuse BHL content. To suggest an API or enhancement, please contact us.


Data Licensing

The BHL makes its metadata available for public use under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license. This Creative Commons license allows you to reuse, modify, repurpose, and distribute the metadata for all purposes including commercial and non-commercial, with no need to ask for permission.

Metadata in this case, refers to:

  • Library catalog records, i.e. bibliographic data, used to describe the books and journals in the BHL collection (e.g. title and author data).
  • Page level data such as page numbers and pages types (e.g. “Title page” and “Illustration”).
  • Scientific name data, e.g. “Zea mays”.

Go ahead, take our metadata and do something creative with it! If you do repurpose BHL metadata please share your story with us. We often like to feature stories of reuse on our BHL blog.

Back to top

Data Exports

A series of files is available for download that will enable libraries and other data providers to identify digitized titles available within BHL. These files include metadata about each volume scanned, as well as information about the millions of scientific names that have been identified throughout the BHL corpus and the pages on which those names occur.

Download documentation.

Data exports are updated at the beginning of every month in the following formats:

All exports are available in two versions. The first contains metadata for all material included in the BHL collection. The second contains only metadata for material hosted by BHL.

To download these files, Right-Click and choose “Save link as…” or “Save target as…”


Complete collection:

BHL-hosted material only:


Complete collection:

BHL-hosted material only:


Complete collection:

BHL-hosted material only:

TSV (Tab-Separated Value)

(note: character encoding for all of the TSV files is Unicode UTF-8)

Complete collection:

BHL-hosted material only:

Back to top


The BHL Application Programming Interface (API) is a set of REST-like web services that can be invoked via HTTP queries (GET/POST requests) or SOAP. Responses can be received in one of three formats: JSON, XML, or XML wrapped in a SOAP envelope.

Please note that users are required to obtain an API Key from in order to use the BHL APIs.

Citation Linking:
BHL provides access to its content via an OpenURL Resolver, as documented and described here:
BHL’s OpenURL Resolver is a popular tool used by biodiversity databases for linking into citations and exact pages of scanned materials.

  • Data providers can also include links to literature using our stable URLs for scanned pages. The URL is displayed below “Link to this page”. For example, to cite the original description of Zea mays:
    Citation: Carl Linnaeus’ Species Plantarum. 2 : 971. 1753.

Also, check these API wrappers provided by colleagues from the Open Source community:

Back to top


Metadata about the books and journals in the BHL collection is published via OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting). OAI-PMH is a protocol used for publishing and harvesting metadata descriptions of records in an archive. More information about the protocol can be found at Descriptive metadata is provided better as MODS (, but also as Dublin Core ( and OLEF. OLEF is a format defined to facilitate metadata harmonization among BHL Partners (see to find out more about the schema and also review this presentation).

The OAI-PMH endpoint for BHL is

We provide 5 sets in BHL:

  1. item
  2. title
  3. part
  4. itemexternal
  5. partexternal

1) Item=This set contains individual volumes hosted by BHL. The content is viewable in BHL.

2) Title=This set contains the monographs and journals represented in BHL.

3) Part=This set contains articles/chapters/treatments/etc hosted by BHL. The content is viewable in BHL.

4) Item External=This set contains individual volumes not hosted by BHL. The content must be viewed on a site not maintained by BHL.

5) Part External=This set contains articles/chapters/treatments/etc not hosted by BHL. The content must be viewed on a site not maintained by BHL.

Most aggregators of BHL content will harvest either Item and Part sets or Title and Part sets but not all three. Whether or not an aggregator chooses the Item or Title set will depend upon the level at which their repository catalogs.

If an aggregator does not want to harvest external content (i.e. content that is not hosted within the BHL repository e.g. then they should not harvest the itemexternal and partexternal sets.

Some example OAI-PMH operations are:

Back to top

Data Quality

BHL is moving to implement the KBART standard for better integrating our data into various discovery layer tools in the future. Our data for digitized legacy materials is sourced and aggregated from our consortium library partner catalogs “as is” and we lack the resources to refine it at this time. Until BHL can implement KBART, any data that may be present in discovery layer tools is likely incomplete. Alternatively, you can find our bibliographic records available via our website, some consortium partner library catalogs, Internet Archive’s “biodiversity” collection, and the Digital Public Library of America (DPLA). Projects to integrate our records into OCLC and Europeana are underway. If you have questions about working with BHL bibliographic data, please contact us.

Back to top

Code and Documentation

Available in Github

R Interface to BHL API, via rOpenSci

Macaw Software

Uploading to Internet Archive

BHL has written instructions on how to upload scanned books to the Internet Archive.

Back to top