• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Interviews
  • crowdsourcing
  • how-to
  • Back to FromThePage
  • Collections

Building a Structured Transcription Tool with FreeUKGen

October 3, 2012 By Ben Brumfield

I'm currently working with FreeUKGen--the charity behind the genealogy database FreeBMD--to build a general-purpose, open-source tool for crowdsourced transcription of structured manuscript data into a searchable database.

We're basing our system on the Scribe tool developed for the Citizen Science Alliance for What's the Score at the Bodleian, which originated out of their experience building OldWeather and other citizen science sites.

We are building the following systems:

  1. A new tool for loading image sets into the Scribe system and attaching them to data-entry templates.
  2. Modifications to the Scribe system to handle our volunteer organization's workflow, plus some usability enhancements.
  3. A publicly-accessible search-and-display website to mine the database created through data entry.
  4. A reporting, monitoring, and coordinating system for our volunteer supervisors.

We also plan to add support for geocoding during transcription and GIS support within the search and display system. Currently, initial development is mostly finished with 1 and moving on to 2 and 3 above.

Although this tool is focused on support for parish registers and census forms, we are intent on creating a general-purpose system for any tabular/structured data.   Scribe's data-entry templates are defined in its database, with the possibility to assign different templates to different images or sets of images.  As a result, we can use a simple template for a 1750 register of burials or a much more complex template for an 1881 census form.  Since each transcribed record is linked to the section of the page image it represents, we have the ability to display the facsimile version of a record alongside its transcript in a list of search results, or to get fancy and pre-populate a transcriber's form with frequently-repeated information like months or birthplaces.

Under the guidance of Ben Laurie, the trustee directing the project, we are committed to open source and open data.  We're releasing the source code under an Apache license and planning to build API access to the full set of record data.

We feel that the more the merrier in an open-source project, so we're looking for collaborators, whether they contribute code, funding, or advice.  We are especially interested in collaborators from archives, libraries, and the genealogy world.

Filed Under: brumfield labs, client projects, digital humanities, structured transcription Tagged With: indexing

Reader Interactions

Comments

  1. Justin York says

    October 3, 2012 at 4:17 pm

    BTW, I'm excited about what you're doing. Do you have a link to a working version?

  2. Ben W. Brumfield says

    October 3, 2012 at 4:26 pm

    Thanks, Justin. I should have an invitation for the rootsdev group put together this afternoon, as soon as I get my ducks in a row license-wise.

    Do you think that a demo on a rootsdev google hangout would be a good idea?

Primary Sidebar

What’s Trending on The FromThePage Blog

  • Archives as an Antidote for ChatGPT
  • How Do I Read Old Handwriting?
  • Spreadsheet Transcription in FromThePage
  • Classifying the Mistakes We Make When We Transcribe
  • An Interview with Dr. Camille Westmont of Sewanee:…
  • 10 Ways to Host a Great Transcribathon

Recent Client Interviews

An Interview with NC State University Libraries

An Interview with Richard Gilreath of the Texas State Library and Archives Commission

An Interview with Julanne Neal of the Queensland State Archives

An Interview with Andrea Meyer of East Hampton Public Library

An Interview with Keith Mitchell of The National Archives (UK)

Read More

artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata ocr paleography podcast Ryan White spreadsheet transcription transcription transcription software
Privacy Policy | Terms & Conditions | About Us | Contact Us

Copyright © 2023 · FromThePage.com