BHL provides data exports and APIs to allow individual users and data providers to download, remix and reuse BHL content. To suggest an API or enhancement, please contact us.
The BHL makes its metadata available for public use under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license. This Creative Commons license allows you to reuse, modify, repurpose, and distribute the metadata for all purposes including commercial and non-commercial, with no need to ask for permission.
Metadata in this case, refers to:
- Library catalog records, i.e. bibliographic data, used to describe the books and journals in the BHL collection (e.g. title and author data).
- Page level data such as page numbers and pages types (e.g. “Title page” and “Illustration”).
- Scientific name data, e.g. “Zea mays”.
A series of files is now available for download that will enable libraries and other data providers to identify digitized titles available within BHL. These files also include metadata about each volume scanned, as well as information about the millions of scientific names that have been identified throughout the BHL corpus and the pages on which those names occur.
Data exports are updated at the beginning of every month in the following formats:
To download these files, Right-Click and choose “Save link as…” or “Save target as…”
- Download BHL Titles in MODS XML (11MB+)
- Download BHL Items/Volumes in MODS XML (25MB+)
- Download BHL Parts in MODS XML (18MB+)
- Download BHL Titles in BibTeX format (9MB+)
- Download BHL Items/Volumes in BibTeX format (12MB+)
- Download BHL Parts in BibTeX format (9MB+)
(note: character encoding for all of the text files is Unicode UTF-8)
- Download contents of Title table as a tab-delimited text file (29MB+)
- Download contents of TitleIdentifier table as a tab-delimited text file (8MB+)
- Download contents of DOI table as a tab-delimited text file (6MB+)
- Download contents of Item (volumes) table as a tab-delimited text file (31MB+)
- Download contents of Subject table as a tab-delimited text file (12MB+)
- Download contents of Creator table as a tab-delimited text file (11MB+)
- Download contents of Part table as a tab-delimited text file (37MB+)
- Download contents of PartCreator table as a tab-delimited text file (7MB+)
- Download .zip file of all tables (including page and name data)(2GB+) Not for the faint of heart! It’s a monster file because it includes the export of data regarding each of our millions of pages as well as the millions of occurrences of scientific names identified in the BHL corpus.
The BHL Application Programming Interface (API) is a set of REST-like web services that can be invoked via HTTP queries (GET/POST requests) or SOAP. Responses can be received in one of three formats: JSON, XML, or XML wrapped in a SOAP envelope.
Please note that users are required to obtain an API Key from https://www.biodiversitylibrary.org/getapikey.aspx in order to use the BHL APIs.
- Version 3
- Endpoint: https://www.biodiversitylibrary.org/api3
- The documentation for the latest version of the API can be found at https://www.biodiversitylibrary.org/docs/api3.html.
- Version 3 incorporates full text search as well as new and updated methods from version 2. Version 3 and 2 have separate endpoints. Both versions live side-by-side and do not conflict.
- This is the preferred version of the API.
- Version 2
- Endpoint: https://www.biodiversitylibrary.org/api2/httpquery.ashx
- The documentation for version 2 of the API can be found at https://www.biodiversitylibrary.org/api2/docs/docs.html.
- Version 1 of the API was limited to data related to scientific names found in the BHL collection; version 2 adds access to title, author, volume, and page information.
- Long-term, version 2 of the API will be deprecated and removed, so users are urged to use version 3.
- Version 1 (formerly the BHL Name Services)
- Updated documentation for the first version of the API can be found at https://www.biodiversitylibrary.org/api/docs/docs.html.
- This version of the API is provided solely to maintain backwards compatibility.
BHL provides access to its content via an OpenURL Resolver, as documented and described here:
BHL’s OpenURL Resolver is a popular tool used by biodiversity databases for linking into citations and exact pages of scanned materials.
- Data providers can also include links to literature using our stable URLs for scanned pages. The URL is displayed below “Link to this page”. For example, to cite the original description of Zea mays:
Citation: Carl Linnaeus’ Species Plantarum. 2 : 971. 1753.
Also, check these API wrappers provided by colleagues from the Open Source community:
- A Ruby wrapper of BHL API version 2.5.x. functionality to make it available as a gem contributed by Matt Yoder et al.:
- An R interface to the BHL API contributed by Scott Chamberlain and Karthik Ram through the rOpenSci project (http://ropensci.org/):
Metadata about the books and journals in the BHL collection is published via OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting). OAI-PMH is a protocol used for publishing and harvesting metadata descriptions of records in an archive. More information about the protocol can be found at http://www.openarchives.org/pmh/. Descriptive metadata is provided better as MODS (http://www.loc.gov/standards/mods/v3/mods-3-0.xsd), but also as Dublin Core (http://www.openarchives.org/OAI/2.0/oai_dc.xsd) and OLEF. OLEF is a format defined to facilitate metadata harmonization among BHL Partners (see http://www.bhle.eu/bhl-schema/v1/ to find out more about the schema and also review this presentation).
The OAI-PMH endpoint for BHL is http://www.biodiversitylibrary.org/oai.
We provide 5 sets in BHL:
1) Item=This set contains individual volumes hosted by BHL. The content is viewable in BHL.
2) Title=This set contains the monographs and journals represented in BHL.
3) Part=This set contains articles/chapters/treatments/etc hosted by BHL. The content is viewable in BHL.
4) Item External=This set contains individual volumes not hosted by BHL. The content must be viewed on a site not maintained by BHL.
5) Part External=This set contains articles/chapters/treatments/etc not hosted by BHL. The content must be viewed on a site not maintained by BHL.
Most aggregators of BHL content will harvest either Item and Part sets or Title and Part sets but not all three. Whether or not an aggregator chooses the Item or Title set will depend upon the level at which their repository catalogs.
If an aggregator does not want to harvest external content (i.e. content that is not hosted within the BHL repository e.g.http://www.biodiversitylibrary.org/bibliography/73220#/summary) then they should not harvest the itemexternal and partexternal sets.
Some example OAI-PMH operations are:
BHL is moving to implement the KBART standard for better integrating our data into various discovery layer tools in the future. Our data for digitized legacy materials is sourced and aggregated from our consortium library partner catalogs “as is” and we lack the resources to refine it at this time. Until BHL can implement KBART, any data that may be present in discovery layer tools is likely incomplete. Alternatively, you can find our bibliographic records available via our website, some consortium partner library catalogs, Internet Archive’s “biodiversity” collection, and the Digital Public Library of America (DPLA). Projects to integrate our records into OCLC WorldCat.org and Europeana are underway. If you have questions about working with BHL bibliographic data, please contact us.
Available in Github https://github.com/gbhl/bhl-us
BHL has written instructions on how to upload scanned books to the Internet Archive.