Developer and Data Tools

BHL provides data exports and APIs to allow individual users and data providers to download, remix and reuse BHL content. To suggest an API or enhancement, please contact us.

 

Data Licensing

BHL makes its metadata available for public use under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license. This Creative Commons license allows you to reuse, modify, repurpose, and distribute the metadata for all purposes including commercial and non-commercial, with no need to ask for permission. The data in BHL’s collection is sourced and aggregated from its consortium partners and Internet Archive contributors. It is provided “as is,” without express or implied warranty as to accuracy, reliability, or fitness for any particular application. Please see our Data Disclaimer for more information.

Back to top

Harvesting Bibliographic Data

BHL is working to make our library catalog records accessible via standard repositories and formats. We provide our title level and item (book or volume) level records through 2 major formats, MARC and KBART, that can allow you to harvest BHL records into your existing library catalog systems. Please see our FAQ — How do I get BHL records into my library catalog? for more information.
Back to top

Data Exports

Exports of BHL bibliographic, scientific name, and full optical character recognized text are available in a variety of formats via the Biodiversity Heritage Library Open Data Collection on Smithsonian’s Figshare https://doi.org/10.25573/data.21081727.v1.

A series of files is available for download that will enable libraries and other data providers to identify digitized titles available within BHL. These files include metadata about each volume scanned, as well as information about the millions of scientific names that have been identified throughout the BHL corpus and the pages on which those names occur.

Download documentation.

Data exports are updated at the beginning of every month in the following formats:

All exports are available in two versions. The first contains metadata for all material included in the BHL collection. The second contains only metadata for material hosted by BHL.

To download these files, Right-Click and choose “Save link as…” or “Save target as…”

MODS

Complete collection:

BHL-hosted material only:

BibTeX

Complete collection:

BHL-hosted material only:

RIS

Complete collection:

BHL-hosted material only:

TSV (Tab-Separated Value)

(note: character encoding for all of the TSV files is Unicode UTF-8)

Complete collection:

BHL-hosted material only:

TXT

BHL-hosted material only:

Back to top

APIs

The BHL Application Programming Interface (API) is a set of REST-like web services that can be invoked via HTTP queries (GET/POST requests) or SOAP. Responses can be received in one of three formats: JSON, XML, or XML wrapped in a SOAP envelope.

Please note that users are required to obtain an API Key from https://www.biodiversitylibrary.org/getapikey.aspx in order to use the BHL APIs.

  • Version 3
    • Endpoint: https://www.biodiversitylibrary.org/api3
    • The documentation for the latest version of the API can be found at https://www.biodiversitylibrary.org/docs/api3.html.
    • Version 3 incorporates full text search as well as new and updated methods from version 2. Version 3 and 2 have separate endpoints. Both versions live side-by-side and do not conflict.
    • This is the preferred version of the API. Users are encouraged to use or upgrade to the latest API version. Enhancements and bug fixes are not guaranteed to be incorporated into older versions.
  • Version 2
  • Version 1 (formerly the BHL Name Services)

Citation Linking:
BHL provides access to its content via an OpenURL Resolver, as documented and described here:
https://www.biodiversitylibrary.org/openurlhelp.aspx.
BHL’s OpenURL Resolver is a popular tool used by biodiversity databases for linking into citations and exact pages of scanned materials.

  • Data providers can also include links to literature using our stable URLs for scanned pages. The URL is displayed below “Link to this page”. For example, to cite the original description of Zea mays:
    Citation: Carl Linnaeus’ Species Plantarum. 2 : 971. 1753.
    https://www.biodiversitylibrary.org/page/358992

Also, check these API wrappers provided by colleagues from the Open Source community:

Back to top

OAI-PMH

Metadata about the books and journals in the BHL collection is published via OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting). OAI-PMH is a protocol used for publishing and harvesting metadata descriptions of records in an archive. More information about the protocol can be found at http://www.openarchives.org/pmh/. Descriptive metadata is provided better as MODS (http://www.loc.gov/standards/mods/v3/mods-3-0.xsd), but also as Dublin Core (http://www.openarchives.org/OAI/2.0/oai_dc.xsd) and OLEF. OLEF is a format defined to facilitate metadata harmonization among BHL Partners (see http://www.bhle.eu/bhl-schema/v1/ to find out more about the schema and also review this presentation).

The OAI-PMH endpoint for BHL is https://www.biodiversitylibrary.org/oai.

We provide 5 sets in BHL:

  1. item
  2. title
  3. part
  4. itemexternal
  5. partexternal

1) Item=This set contains individual volumes hosted by BHL. The content is viewable in BHL.

2) Title=This set contains the monographs and journals represented in BHL.

3) Part=This set contains articles/chapters/treatments/etc hosted by BHL. The content is viewable in BHL.

4) Item External=This set contains individual volumes not hosted by BHL. The content must be viewed on a site not maintained by BHL.

5) Part External=This set contains articles/chapters/treatments/etc not hosted by BHL. The content must be viewed on a site not maintained by BHL.

Most aggregators of BHL content will harvest either Item and Part sets or Title and Part sets but not all three. Whether or not an aggregator chooses the Item or Title set will depend upon the level at which their repository catalogs.

If an aggregator does not want to harvest external content (i.e. content that is not hosted within the BHL repository e.g.https://www.biodiversitylibrary.org/bibliography/73220#/summary) then they should not harvest the itemexternal and partexternal sets.

Some example OAI-PMH operations are:

Back to top

Code and Documentation

Available in Github https://github.com/gbhl/bhl-us

R Interface to BHL API, via rOpenSci

https://github.com/ropensci/rbhl

Macaw Software

https://github.com/cajunjoel/macaw-book-metadata-tool

Uploading to Internet Archive

BHL has written instructions on how to upload scanned books to the Internet Archive.

Back to top