• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

about crowdsourcing, manuscript transcription, digital humanities and digital documentary editions

  • Home
  • Project Profiles
  • Interviews with Clients
  • Collections
  • Back to FromThePage

Feature: Mechanical Turk Integration

March 24, 2009 By Ben Brumfield

At last week's Austin On Rails SXSW party, my friend and compatriot Steve Odom gave me a really neat feature idea. "Why don't you integrate with Amazon's Mechanical Turk?" he asked. This is an intriguing notion, and while it's not on my own road map, it would be pretty easy to modify FromThePage to support that. Here's what I'd do to use FromThePage on a more traditional transcription project, with an experienced documentary editor at the head and funding for transcription work:

Page Forks: I assume that the editor using Mechanical Turk would want double keyed transcriptions to maintain quality, so the application needs to present the same, untranscribed page to multiple people. In the software world, when a project splits, we call this forking, and I think that the analogy applies here. This feature needs to be able to track an entirely separate edit history for the different forks of a page. This means a new attribute on the master page record describing whether more than one fork exists, and a separate edit history for each fork of a page that's created. There's no reason to limit these transcriptions to only two forks, even if that's the most common use case, so I'd want to provide a URL that will automatically create a new fork for a new transcriber to work in. The Amazon HIT (Human Intelligence Task) would have a link to that URL, so the transcriber need never track which fork they're working in, or even be aware of the double keying.

Reconciling Page Forks: After a page has been transcribed more than one time, the application needs to allow the editor to reconcile the transcriptions. This would involve a screen displaying the most recent version of two transcriptions alongside the scanned page image. Likely there's a decent Rails plug in already for displaying code diffs, so I could leverage that to highlight differences between the two transcriptions. A fourth pane would allow the editor to paste in the reconciled transcription into the master page object.

Publishing MTurk HITs: Since each page is an independent work unit, it should be possible to automatically convert an untranscribed work into MTurk HITs, with a work item for each page. I don't know enough about how MTurk works, but I assume that the editor would need to enter their Amazon account credentials to have the application create and post the HITs. The app also needs to prevent the same user from re-transcribing the same page in multiple forks.

In all, it doesn't sound like more than a month or two worth of work, even performed part-time. This isn't a need I have for the Julia Brumfield diaries, so I don't anticipate building this any time soon. Nevertheless, it's fun to speculate. Thanks, Steve!

Filed Under: features

Primary Sidebar

What’s Trending on The FromThePage Blog

  • How to Learn to Read Shorthand
  • Project Profile: Sewanee Project on Slavery, Race…
  • Interview: Dr. Laura Morreale on Teaching and…
  • Survey on Crowdsourced Transcription Tools
  • UI and Other Fun Stuff
  • Prosopography Hackathon Project: Using Machine…

Recent Client Interviews

An Interview with Erin Wilson of Ohio University Libraries

An Interview with Susannah Ural of the Civil War & Reconstruction Governors of Mississippi Project

An Interview with Olivia Carlisle of the State Archives of North Carolina

An Interview with Paige Roberts of Phillips Academy Archives & Special Collections

An Interview with Riley Bogran of the Sandy Spring Museum

Privacy Policy | Terms & Conditions | About Us | Contact Us

Copyright © 2021 · FromThePage.com