• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Interviews
  • crowdsourcing
  • how-to
  • Back to FromThePage
  • Collections

Progress Report: GitHub, Archive.org Integration, and General Availability

January 4, 2011 By Ben Brumfield

2010 saw big changes in FromThePage.

  • The Balboa Park Online Collaborative started using FromThePage to transcribe the field notes of herpetologist Laurence Klauber. Perian Sully, Rich Cherry, and all the other folks there have been fantastic to work with: full of enthusiasm and new ideas for the system while patient with the bugs that we've discovered. This is the first institution to install FromThePage, and their needs have driven a lot of development since October, including
  • Internet Archive integration: As you can see on the Klauber site, FromThePage now integrates directly with books hosted on the Internet Archive. This means that FromThePage gets to use the BookReader (in modified form) with its spiffy zoom and pan capabilities while delegating the expensive work of image hosting to Archive.org. It also reduces duplication of data and may enhance findability of the transcriptions. Best of all, the tedious process of uploading, assembling, and titling page images can be skipped, as FromThePage now imports the book structure and even the OCRed page titles from Archive.org derivative files.
  • As you can see from that last link, I've transferred FromThePage over to GitHub, released it under the Affero GPL, and created some extensive documentation on the wiki. So FromThePage is officially Free software, available for immediate use.

If you're interested in hosting a transcription project on FromThePage, drop me a line at benwbrum@gmail.com and I'll help you get started.

Filed Under: progress Tagged With: history

Reader Interactions

Comments

  1. Jason says

    February 6, 2011 at 12:03 am

    Greetings Ben, I am hoping that you can help me. I am sitting on pages upon page of scans of obituaries. The obits are very clean (neat, orderly and easy to read) and currently combined into a PDF file. Is there some sort of software that I can use that will allow be to transcribe these obits?

  2. Ben W. Brumfield says

    February 6, 2011 at 1:40 am

    Jason,

    If your obituaries are printed–and I assume they are–your best starting point is OCR software. This will convert the text in the images into plaintext, so you'll only have to proofread and correct the automatic transcriptions. Many scanning tools do OCR, including some commercially available Adobe products. Harder to use (but free!) is the Internet Archive community texts project, in which you'd upload your PDF and let their server software do the OCR for you.

    I'm no expert on this–after all, you can't OCR handwritten material–but that's the advice I'd pass along.

    Best of luck!

Primary Sidebar

What’s Trending on The FromThePage Blog

  • Archives as an Antidote for ChatGPT
  • How Do I Read Old Handwriting?
  • Spreadsheet Transcription in FromThePage
  • Classifying the Mistakes We Make When We Transcribe
  • An Interview with Dr. Camille Westmont of Sewanee:…
  • 10 Ways to Host a Great Transcribathon

Recent Client Interviews

An Interview with NC State University Libraries

An Interview with Richard Gilreath of the Texas State Library and Archives Commission

An Interview with Julanne Neal of the Queensland State Archives

An Interview with Andrea Meyer of East Hampton Public Library

An Interview with Keith Mitchell of The National Archives (UK)

Read More

artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata ocr paleography podcast Ryan White spreadsheet transcription transcription transcription software
Privacy Policy | Terms & Conditions | About Us | Contact Us

Copyright © 2023 · FromThePage.com