BHL provides data exports and APIs to allow individual users and data providers to download, remix and reuse BHL content. To suggest an API or enhancement, please contact us.
Data Licensing
BHL makes its metadata available for public use under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. This Creative Commons dedication allows you to reuse, modify, repurpose, and distribute the metadata for all purposes including commercial and non-commercial, with no need to ask for permission. The data in BHL’s collection is sourced and aggregated from its consortium partners and Internet Archive contributors. It is provided “as is,” without express or implied warranty as to accuracy, reliability, or fitness for any particular application. Please see our Data Disclaimer for more information.
Harvesting Bibliographic Data
BHL is working to make our library catalog records accessible via standard repositories and formats. We provide our title level and item (book or volume) level records through 2 major formats, MARC and KBART, that can allow you to harvest BHL records into your existing library catalog systems. Please see our FAQ — How do I get BHL records into my library catalog? for more information.
Back to top
Data Exports
Exports of BHL bibliographic, scientific name, and full optical character recognized text are available in a variety of formats via the Biodiversity Heritage Library Open Data Collection on Smithsonian’s Figshare:
https://smithsonian.figshare.com/projects/Biodiversity_Heritage_Library_Open_Data_Collection/151911
A series of files is available for download that will enable libraries and other data providers to identify digitized titles available within BHL. These files include metadata about each volume scanned, as well as information about the millions of scientific names that have been identified throughout the BHL corpus and the pages on which those names occur.
Data exports are updated at the beginning of every month in the following formats:
All exports are available in two versions. The first contains metadata for all material included in the BHL collection. The second contains only metadata for material hosted by BHL.
To download these files, Right-Click and choose “Save link as…” or “Save target as…”
Complete collection:
- Download BHL Titles in MODS XML (11 MB+)
- Download BHL Items/Volumes in MODS XML (25 MB+)
- Download BHL Parts in MODS XML (18 MB+)
BHL-hosted material only:
- Download BHL Titles in MODS XML (11 MB+)
- Download BHL Items/Volumes in MODS XML (25 MB+)
- Download BHL Parts in MODS XML (18 MB+)
Complete collection:
- Download BHL Titles in BibTeX format (9 MB+)
- Download BHL Items/Volumes in BibTeX format (12 MB+)
- Download BHL Parts in BibTeX format (9 MB+)
BHL-hosted material only:
- Download BHL Titles in BibTeX format (9 MB+)
- Download BHL Items/Volumes in BibTeX format (12 MB+)
- Download BHL Parts in BibTeX format (9 MB+)
Complete collection:
- Download BHL Titles in RIS format (9 MB+)
- Download BHL Items/Volumes in RIS format (12 MB+)
- Download BHL Parts in RIS format (9 MB+)
BHL-hosted material only:
- Download BHL Titles in RIS format (9 MB+)
- Download BHL Items/Volumes in RIS format (12 MB+)
- Download BHL Parts in RIS format (9 MB+)
(note: character encoding for all of the TSV files is Unicode UTF-8)
Complete collection:
- Download contents of Title table as a tab-delimited text file (44 MB+)
- Download contents of TitleIdentifier table as a tab-delimited text file (14 MB+)
- Download contents of DOI table as a tab-delimited text file (12 MB+)
- Download contents of Item (volumes) table as a tab-delimited text file (50 MB+)
- Download contents of Subject table as a tab-delimited text file (26 MB+)
- Download contents of Creator table as a tab-delimited text file (22 MB+)
- Download contents of CreatorIdentifier table as a tab-delimited text file (4 MB+)
- Download contents of Part table as a tab-delimited text file (108 MB+)
- Download contents of PartIdentifier table as a tab-delimited text file (11 MB+)
- Download contents of PartCreator table as a tab-delimited text file (16 MB+)
- Download .zip file of all tables (including page and name data)(2.8 GB+) Not for the faint of heart! It’s a monster file because it includes the export of data regarding each of our millions of pages as well as the millions of occurrences of scientific names identified in the BHL corpus.
BHL-hosted material only:
- Download contents of Title table as a tab-delimited text file (38 MB+)
- Download contents of TitleIdentifier table as a tab-delimited text file (13 MB+)
- Download contents of DOI table as a tab-delimited text file (12 MB+)
- Download contents of Item (volumes) table as a tab-delimited text file (44 MB+)
- Download contents of Subject table as a tab-delimited text file (24 MB+)
- Download contents of Creator table as a tab-delimited text file (21 MB+)
- Download contents of CreatorIdentifier table as a tab-delimited text file (4 MB+)
- Download contents of Part table as a tab-delimited text file (89 MB+)
- Download contents of PartIdentifier table as a tab-delimited text file (9 MB+)
- Download contents of PartCreator table as a tab-delimited text file (13 MB+)
- Download .zip file of all tables (including page and name data)(1.8 GB) Not for the faint of heart! It’s a monster file because it includes the export of data regarding each of our millions of pages as well as the millions of occurrences of scientific names identified in the BHL corpus.
BHL-hosted material only:
- Download contents of BHL Optical Character Recognition (OCR) – Full Text Export (40 GB+) A full export of the 60+ million pages of OCR content in the Biodiversity Heritage Library.
APIs
The BHL Application Programming Interface (API) is a set of REST-like web services that can be invoked via HTTP queries (GET/POST requests) or SOAP. Responses can be received in one of three formats: JSON, XML, or XML wrapped in a SOAP envelope.
Please note that users are required to obtain an API Key from https://www.biodiversitylibrary.org/getapikey.aspx in order to use the BHL APIs.
- Version 3
- Endpoint: https://www.biodiversitylibrary.org/api3
- The documentation for the latest version of the API can be found at https://www.biodiversitylibrary.org/docs/api3.html.
- Version 3 incorporates full text search as well as new and updated methods from version 2. Version 3 and 2 have separate endpoints. Both versions live side-by-side and do not conflict.
- This is the preferred version of the API. Users are encouraged to use or upgrade to the latest API version. Enhancements and bug fixes are not guaranteed to be incorporated into older versions.
- Version 2
- Endpoint: https://www.biodiversitylibrary.org/api2/httpquery.ashx
- The documentation for version 2 of the API can be found at https://www.biodiversitylibrary.org/api2/docs/docs.html.
- Version 1 of the API was limited to data related to scientific names found in the BHL collection; version 2 adds access to title, author, volume, and page information.
- Long-term, version 2 of the API will be deprecated and removed, so users are urged to use version 3.
- Version 1 (formerly the BHL Name Services)
- Updated documentation for the first version of the API can be found at https://www.biodiversitylibrary.org/api/docs/docs.html.
- This version of the API is provided solely to maintain backwards compatibility.
Citation Linking:
BHL provides access to its content via an OpenURL Resolver, as documented and described here:
https://www.biodiversitylibrary.org/openurlhelp.aspx.
BHL’s OpenURL Resolver is a popular tool used by biodiversity databases for linking into citations and exact pages of scanned materials.
- Data providers can also include links to literature using our stable URLs for scanned pages. The URL is displayed below “Link to this page”. For example, to cite the original description of Zea mays:
Citation: Carl Linnaeus’ Species Plantarum. 2 : 971. 1753.
https://www.biodiversitylibrary.org/page/358992
Also, check these API wrappers provided by colleagues from the Open Source community:
- A Ruby wrapper of BHL API version 2.5.x. functionality to make it available as a gem contributed by Matt Yoder et al.:
- An R interface to the BHL API contributed by Scott Chamberlain and Karthik Ram through the rOpenSci project (http://ropensci.org/):
OAI-PMH
Metadata about the books and journals in the BHL collection is published via OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting). OAI-PMH is a protocol used for publishing and harvesting metadata descriptions of records in an archive. More information about the protocol can be found at http://www.openarchives.org/pmh/. Descriptive metadata is provided better as MODS (http://www.loc.gov/standards/mods/v3/mods-3-0.xsd), but also as Dublin Core (http://www.openarchives.org/OAI/2.0/oai_dc.xsd) and OLEF. OLEF is a format defined to facilitate metadata harmonization among BHL Partners (see http://www.bhle.eu/bhl-schema/v1/ to find out more about the schema and also review this presentation).
The OAI-PMH endpoint for BHL is https://www.biodiversitylibrary.org/oai.
We provide 5 sets in BHL:
- item
- title
- part
- itemexternal
- partexternal
1) Item=This set contains individual volumes hosted by BHL. The content is viewable in BHL.
2) Title=This set contains the monographs and journals represented in BHL.
3) Part=This set contains articles/chapters/treatments/etc hosted by BHL. The content is viewable in BHL.
4) Item External=This set contains individual volumes not hosted by BHL. The content must be viewed on a site not maintained by BHL.
5) Part External=This set contains articles/chapters/treatments/etc not hosted by BHL. The content must be viewed on a site not maintained by BHL.
Most aggregators of BHL content will harvest either Item and Part sets or Title and Part sets but not all three. Whether or not an aggregator chooses the Item or Title set will depend upon the level at which their repository catalogs.
If an aggregator does not want to harvest external content (i.e. content that is not hosted within the BHL repository e.g.https://www.biodiversitylibrary.org/bibliography/73220#/summary) then they should not harvest the itemexternal and partexternal sets.
Some example OAI-PMH operations are:
- https://www.biodiversitylibrary.org/oai?verb=Identify
- https://www.biodiversitylibrary.org/oai?verb=ListMetadataFormats
- https://www.biodiversitylibrary.org/oai?verb=ListSets
- https://www.biodiversitylibrary.org/oai?verb=ListIdentifiers&metadataPrefix=oai_dc&set=title&from=2009-02-01&until=2009-02-04
- https://www.biodiversitylibrary.org/oai?verb=ListRecords&metadataPrefix=oai_dc&set=title&from=2009-02-01&until=2009-02-04
- https://www.biodiversitylibrary.org/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:biodiversitylibrary.org:title/2
- https://www.biodiversitylibrary.org/oai?verb=ListRecords&metadataPrefix=olef&set=part&from=2013-08-27&until=2013-08-28
Code and Documentation
Available in Github https://github.com/gbhl/bhl-us
R Interface to BHL API, via rOpenSci
https://github.com/ropensci/rbhl
Macaw Software
https://github.com/cajunjoel/macaw-book-metadata-tool
Uploading to Internet Archive
BHL has written instructions on how to upload scanned books to the Internet Archive.