Purposeful gaming and BHL: engaging the public in improving and enhancing access to digital texts.
Although this project ended in Nov 2015, both Smorball and Beanstalk games will continue to be available in 2016 at http://smorballgame.org and http://beanstalkgame.org and the input will continue to improve OCR output from BHL. Thank you for playing and helping improve access to science resource!
Smorball wins “Best Serious Game” award at Boston Festival of Indie Games!
Players of the more challenging Smorball game are asked to type the words they see as quickly and accurately as possible to help coach their team, the Eugene Melonballers, to victory to win the coveted Dalahäst Trophy in the fictional sport of Smorball. Each word typed correctly defeats an opposing smorbot and brings the Melonballers closer to the championships.
Players of the more relaxed Beanstalk game must type the words presented to them correctly in order to grow their beanstalk from a tiny tendril to a massive cloudscraper. The more words they type correctly, the faster the beanstalk grows. Players who accurately transcribe the most words will ascend to the top of the leaderboard as a result of their valuable contributions.
Both Smorball and Beanstalk were designed by Tiltfactor and are licensed as Free and Open Source Software (FOSS).
We’re not currently integrating material from other institutions in OUR build of the game, but the good news is the games and their supporting software are open source so you can fairly easily host your own.
There are a few steps to hosting your own Smorball or Beanstalk games:
1. Prepare your material. The games are OCR correction games, and in order for them to function they take data in the form of single words that different OCR software disagree on their interpretations of. Each “difference” sent to the games must have a page image URL, a location on that page image, and two strings that represent what the two OCR software THINK the word is. It’s from these two strings that the games estimate whether or not the player has typed the right answer.
2. Host the game(s) and the game backend. You can find the game code here: https://github.com/tiltfactor/smorball and the code for the game database and data management server here: https://github.com/tiltfactor/SmorballBeanstalk-Backend.
3. Configure the games. If you want to run Beanstalk, make sure your version of Beanstalk has its own high score database (via parse.com). If you want the facebook and twitter buttons in your Smorball to go to your social media accounts, generate facebook and twitter developer API keys, etc.
This project, which has been generously funded by the Institute of Museum and Library Services (IMLS), aims to significantly improve access to digital texts through the applicability of purposeful gaming for the completion of data enhancement tasks needed for content found within the Biodiversity Heritage Library (BHL). This project tackles a major challenge for digital libraries: full-text searching of texts is significantly hampered by poor output from Optical Character Recognition (OCR) software. Historic literature has proven to be particularly problematic because of its tendency to have varying fonts, typesetting, and layouts that make it difficult to accurately render. The European Union’s IMPACT project, a 2008-2012 effort to improve access to texts states that poor OCR “does in many cases not produce satisfying results for historical documents. Recognition rates are poor or even useless. No commercial or other OCR engine is able to cope satisfactorily with the wide range of printed materials published between the start of the Gutenberg age in the 15th century and the start of the industrial production of books in the middle of the 19th century.” This state of affairs illustrates the pressing need to identify additional solutions to OCR for improving access to digital texts.
The BHL is an international consortium of the world’s leading natural history libraries, including the Missouri Botanical Garden’s Peter H. Raven Library, that have collaborated to digitize the public domain literature documenting the world’s biological diversity. This has resulted in the single largest, open-licensed source of biodiversity literature made available both through the Internet Archive and through a customized portal at http://www.biodiversitylibrary.org. BHL is a perfect testbed for investigating alternate solutions to the generation of digital outputs both because it is a significantly large corpus (41 million pages of scanned texts accompanied by 41 million OCR outputs) and because most of its content is historic literature (the majority of BHL content was published between 1450s-1900s). OCR is also largely ineffective on hand-written texts such as field notebooks–a growing content type in the BHL.
Purposeful Gaming and BHL will demonstrate whether or not digital games are a successful tool for analyzing and improving digital outputs from OCR and transcription activities because large numbers of users can be harnessed quickly and efficiently to focus on the review and correction of particularly problematic words by being presented the task as a game.
The project runs from December 1, 2013 through November 30, 2015 and will be conducted by the Missouri Botanical Garden’s Center for Biodiversity Informatics (CBI) in partnership with Harvard University, Cornell University, and the New York Botanical Garden.
|A sample of poor OCR output from an 18th century publication.
This page is from Linneaus’ Species Plantarum published in 1753. An image of the original text is on the left. The OCR is on the right.
|A sample of poor OCR output from a hand written text. This page is from the Diaries of William Brewster, 1865-1919.|
|This project was made possible in part by the Institute of Museum and Library Services [LG-05-13-0352-13]|
Missouri Botanical Garden
- Trish Rose-Sandler, Data Project Coordinator, Center for Biodiversity Informatics
- William Ulate, Senior Project Coordinator, Center for Biodiversity Informatics
- Mike Lichtenberg, Programmer, Center for Biodiversity Informatics
- Stephen Kappel, Programmer, Center for Biodiversity Informatics
- Doug Holland, Director, Peter H. Raven Library
- Mike Blomberg, Imaging Lab Coordinator, Peter H. Raven Library
- Chuck Miller, Vice President of Information Technology and Chief Information Officer
Ernst Mayr Library of the Museum of Comparative Zoology at Harvard University
- James Hanken, Director of the Museum of Comparative Zoology
- Constance Rinaldo, Librarian of the Ernst Mayr Library
- Joe deVeer, Project Manager
- Robert Young, Special Collections Librarian
- Patrick Randall, Outreach and Communications
The LuEsther T. Mertz Library, New York Botanical Garden
- Susan Fraser, Director
- Susan Lynch, Systems Librarian
- John Mignault, Systems Librarian (previous)
- Kevin Nolan, Digital Projects Manager
- Lisa Studier, Metadata Cataloger
- Yumi Choi, Catalog Librarian
- Andrew Tschinkel, Scanning Technician
- Paul Silverman, Scanning Technician
Cornell University Library
- Martin Schlabach, Librarian
- Kevin Nixon, Professor of Botany
- Holly Mistlebauer
Project Narrative Purposeful Gaming Narrative.
Schedule of Completion Schedule of Completion.
Workflow diagram Workflow Diagram.
Word comparison across outputs Word Comparison across Outputs.
- Max J. Seidman, Dr. Mary Flanagan, Trish Rose-Sandler, and Mike Lichtenberg,“Are games a viable solution to crowdsourcing improvements to faulty OCR? – The Purposeful Gaming and BHL experience”, Code4Lib Journal, Issue 33, July 2016.
- TDWG Annual Meeting Nov 2015 Nairobi, “Engaging the Citizen Scientist in content enhancement for BHL” William Ulate.
- Upstate New York Science Librarians Annual Meeting, Cornell University, Oct 23, 2015,“Purposeful Gaming: Crowdsourcing the Correction of OCRed Text in the Biodiversity Heritage Library,” Marty Schlabach.
- Council on Botanical & Horticultural Libraries newsletter, Number 138, September 2015 “An Online Game to Correct Inaccurate Optical Character Recognition (OCR) in the Biodiversity Heritage Library: A Purposeful Gaming Update“. Presented and reported by Marty Schlabach, Food & Agriculture Librarian, Mann Library, Cornell University.
- Cornell University Reunion, Mann Library, Ithaca, NY, June 2015, “Harvesting Heritage: Seed & Nursery Catalog Digitization, Discovery & Access“, Marty Schlabach.
- Council on Botanical & Horticultural Libraries annual meeting, Seed Savers Exchange, Decorah, IA, June 2015, “An Online Game to Correct Inaccurate Optical Character Recognition (OCR) in BHL: A Purposeful Gaming Update“, Marty Schlabach.
- Council on Botanical & Horticultural Libraries annual meeting, Seed Savers Exchange, Decorah, IA, June 2015, a handout of links related to seed and nursery catalogs and the Purposeful Gaming project, BHL Seed Catalog Collection CBHL 2015.06.18 links handout.pdf.
- CCLA National Meeting, Univ of Maryland, May 2015 Engaging the public: Best Practices for Crowdsourcing across the disciplines, Trish Rose-Sandler was part of the “Dispatches from the Field” panel.
- Heirloom Gardening Seminar, Genesee Country Village & Museum, Mumford, NY, Feb 2015, “Free Online Historical Seed & Nursery Catalog Collection“, Marty Schlabach.
- CONABIO workshop, Dec 2014, Mexico, “Digitalization de literatura de Biodiversidad” William Ulate.
- TDWG Annual Meeting Nov 2014 Sweden, “Making Links in the BHL: Primary Source Materials as a Window to a Scientist’s Methods” Connie Rinaldo.
- Council on Botanical & Horticultural Libraries meeting, Richmond VA, May 2014, “Purposeful Gaming, OCR correction and Seed & Nursery Catalog Digitization“, Marty Schlabach.
- iDigBio CITScribe Hackathon, Florida, Dec 2013 “Purposeful gaming and BHL: engaging the public in improving and enhancing access to digital texts“, William Ulate.
- Interlaken Historical Society, April 25, 2016, Not Your Grandmother’s Library!, Marty Schlabach.
- Max J. Seidman, Dr. Mary Flanagan, Trish Rose-Sandler, and Mike Lichtenberg, Are games a viable solution to crowdsourcing improvements to faulty OCR? – The Purposeful Gaming and BHL experience”, Code4Lib Journal, Issue 33, July 2016.
- SciStarter features Smorball in newsletter “Whenever, whereever, you and citizen science are meant to be together” .
- Journal of Agricultural & Food Information “Purposeful Gaming and the Biodiversity Heritage Library“.
- Harvard Gazette “A playful turn for libraries“.
- Scientific American profile of Smorball.
- The Edwardsville Intelligencer “MoBOT receives national honors”.
- SciStarter features Purposeful Gaming in article “6 ways to be a Citizen Scientist from the comfort of your couch” and list project on site http://scistarter.com/project/1230.
- New York Botanic Garden’s Plant Talk blog post.
- IMLS blog post on the games.
- MOBOT’s press release on the game award at Boston festival.
- Tiltfactor’s blog post on game award at Boston Festival.
- Ariadne article “Purposeful Gaming: Work as Play”.
- Boston Globe article “Indie gaming fans crowd Boston festival”.
- Harvard University’s Ludics Seminar “Purposeful Gaming”.
- ActuaLitté.com “Des mini-jeux pour améliorer la numérisation en bibliothèques”.
- BHL blog post “Smorball and Beanstalk: Games that aren’t just fun to play but help science too”.
- Library Journal article “Biodiversity Heritage Library launches crowdsourcing games”.
- Library Journal article “Wisdom of the Crowd”.
- BHL blog post announcing release of the games “Smorball and Beanstalk Are Live!”.
- MOBOT’s press release “Missouri Botanical Garden Project Releases Games to Improve Access to Digital Text”.
- Dartmouth’s press release “Dartmouth’s Tiltfactor Launches Games to Improve Access to Biodiversity Heritage Library Content”.
- Smithsonian Libraries press release “Dartmouth’s Tiltfactor Launches Games to Improve Access to Biodiversity Heritage Library Content”.
- Cornell Chronicle article “Plant experts discuss new seeds and old seed catalogs”.
- Version of Cornell Chronicle piece republished by the Boyce Thompson Institute for Plant Research.
- Albert R. Mann Library article “Let the Games Begin”.
- BHL blog post on the writings of William Brewster “A Bridge to the Past: The Writings of William Brewster”.
- Harvard Gazette article about William Brewster’s ornithological writings “Crowdsourcing old journals”.
- BHL blog post on seed catalog digitization “The Stories Seeds Tell”.
- Ornithology Exchange article about crowdsourcing transcription of William Brewster’s ornithological writing “Step back into ornithological history”.
- BHL blog post on the crowdsourcing aspects of the project “Crowdsourcing and BHL”.
- BHL blog post on transcription activities “Transcribing the Field Notes of William Brewster”.
Choice of Game Designer
- MOBOT blog post announcing choice of game designer, Tiltfactor http://www.missouribotanicalgarden.org/media/news-releases/article/700/missouri-botanical-garden-project-selects-designer-for-purposeful-gaming-grant.aspx.
- BHL blog post announcing choice of Tiltfactor http://blog.biodiversitylibrary.org/2014/06/game-laboratory-tiltfactor-selected-for.html.
Initial Grant Award
- IMLS announcement http://www.imls.gov/news/2013_ols_grant_announcement.aspx#MO.
- MOBOT press release http://www.missouribotanicalgarden.org/media/news-releases/article/639/garden-aims-to-improve-access-to-digital-texts-through-online-gaming.aspx.
- Front page article in St Louis Post Dispatch http://www.stltoday.com/news/local/metro/missouri-botanical-garden-builds-a-new-kind-of-video-game/article_caa6455e-a789-58de-bf5e-9ba27f1c7856.html.
- Harvard Library http://lib.harvard.edu/blog-post-topics/ernst-mayr-mc.
- Cornell http://mannlib.cornell.edu/news/new-games-old-seed-catalogs and http://news.cornell.edu/essentials/2013/12/games-purpose.
- D-Lib magazine write-up in the In Brief column http://www.dlib.org/dlib/january14/01inbrief.html.
- Project feature by the Center for Advancement of Informal Science Education (CAISE) http://informalscience.org/projects/ic-000-000-010-626/Purposeful_gaming_and_BHL.
- We recently joined the Crowdsourcing Consortium for Libraries and Archives (CCLA). Supported by the Institute of Museum and Library Services, the goal of CCLA is to create a forum that enables all interested stakeholders to join a national conversation about the most pressing needs and challenges regarding the development and deployment of crowdsourcing technologies in the cultural heritage domain.
- Excellent summary post by Ben W. Brumfield on QC for Collaborative (Crowdsourced) Manuscript Transcription at http://manuscripttranscription.blogspot.com/2012/03/quality-control-for-crowdsourced.html.
- Discussion minutes, software developed and presentations recorded from the Notes from Nature/iDigBio Hackathon to Further Enable Public Participation in the Online Transcription of Biodiversity Specimen Labels on December 16-20 at the University of Florida in Gainsville. https://www.idigbio.org/wiki/index.php/Transcription_Hackathon.
For more information please contact the project’s Principal Investigator, Trish Rose-Sandler at 314-577-9473 x6396 or firstname.lastname@example.org.