• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Project Profiles
  • Interviews with Clients
  • Collections
  • Back to FromThePage

Building a Structured Transcription Tool with FreeUKGen

October 3, 2012 By Ben Brumfield

I'm currently working with FreeUKGen--the charity behind the genealogy database FreeBMD--to build a general-purpose, open-source tool for crowdsourced transcription of structured manuscript data into a searchable database.

We're basing our system on the Scribe tool developed for the Citizen Science Alliance for What's the Score at the Bodleian, which originated out of their experience building OldWeather and other citizen science sites.

We are building the following systems:

  1. A new tool for loading image sets into the Scribe system and attaching them to data-entry templates.
  2. Modifications to the Scribe system to handle our volunteer organization's workflow, plus some usability enhancements.
  3. A publicly-accessible search-and-display website to mine the database created through data entry.
  4. A reporting, monitoring, and coordinating system for our volunteer supervisors.

We also plan to add support for geocoding during transcription and GIS support within the search and display system. Currently, initial development is mostly finished with 1 and moving on to 2 and 3 above.

Although this tool is focused on support for parish registers and census forms, we are intent on creating a general-purpose system for any tabular/structured data.   Scribe's data-entry templates are defined in its database, with the possibility to assign different templates to different images or sets of images.  As a result, we can use a simple template for a 1750 register of burials or a much more complex template for an 1881 census form.  Since each transcribed record is linked to the section of the page image it represents, we have the ability to display the facsimile version of a record alongside its transcript in a list of search results, or to get fancy and pre-populate a transcriber's form with frequently-repeated information like months or birthplaces.

Under the guidance of Ben Laurie, the trustee directing the project, we are committed to open source and open data.  We're releasing the source code under an Apache license and planning to build API access to the full set of record data.

We feel that the more the merrier in an open-source project, so we're looking for collaborators, whether they contribute code, funding, or advice.  We are especially interested in collaborators from archives, libraries, and the genealogy world.

Filed Under: brumfield labs, client projects, digital humanities, structured transcription Tagged With: indexing

Reader Interactions

Comments

  1. Justin York says

    October 3, 2012 at 4:17 pm

    BTW, I'm excited about what you're doing. Do you have a link to a working version?

  2. Ben W. Brumfield says

    October 3, 2012 at 4:26 pm

    Thanks, Justin. I should have an invitation for the rootsdev group put together this afternoon, as soon as I get my ducks in a row license-wise.

    Do you think that a demo on a rootsdev google hangout would be a good idea?

Primary Sidebar

What’s Trending on The FromThePage Blog

  • Learn to Decipher Old Handwriting with Online and…
  • Start Reading Old Handwriting: Some Recommended Books
  • An Interview with Olivia Carlisle of the State…
  • An Interview with Olivia Carlisle of the State…
  • "The Landscape of Crowdsourcing and Transcription"…
  • Classifying the Mistakes We Make When We Transcribe

Recent Client Interviews

An Interview with Olivia Carlisle of the State Archives of North Carolina

An Interview with Amber Kuo of the LA County Public Library

An Interview with Meredith McDonough of the Alabama Department of Archives and History

An Interview with Rebecca Dillmeier of the United States Holocaust Memorial Museum

An Interview with Elise Edmonds of the State Library of New South Wales

Read More

artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata ocr paleography podcast Ryan White spreadsheet transcription transcription transcription software
Privacy Policy | Terms & Conditions | About Us | Contact Us

Copyright © 2022 · FromThePage.com