• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Interviews
  • crowdsourcing
  • how-to
  • Back to FromThePage
  • Collections

Feature: Transcription Versions

July 13, 2007 by Ben Brumfield

Page/Subject Article Versions
Last week I added versioning to articles and pages. The goal was to allow access to previous edits via a version page akin to the MediaWiki history tab.

Gavin Robinson suggested a system of review and approval before transcription changes go live, but I really think that this doesn't fit well with my user model. For one thing, I don't expect the same kinds of vandalism problems you see in Wikipedia to affect FromThePage works much, since the editors are specifically authorized by the work owner. For another, I can't imagine the solo-owner/scribe would tolerate having to submit and approve each one of their edits for long. Finally, since this is designed for a loosely-coupled, non-institutional user community, I simply can't assume that the work owner will check the site each day to review and approve changes. Projects must be able to keep their momentum without intervention by the work owner for months at a time.

His concerns are quite valid, however. Perhaps an alternative approach to transcription quality is to develop a few more owner tools, like a bulk review/revert feature for contributions made since a certain date or by a certain user.

Work Versions
Later, I'll put up a technical post on how I accomplished this with Rails after_save callbacks, but for now I'd like to talk about "versions" of a perpetually-editable work. What exactly does this mean? If a user prints out or downloads a transcription between one change and the next, how do you indicate that?

To address this, I decided to add the concept of a work's "transcription version". This is an additional attribute of the work itself, and every time an edit is made to any one of the work's pages, the work itself has its transcription version incremented. By recording the transcription version of the work in the page version record as well, I should be able to reconstruct the exact state of the digital work from a number added to an offline copy of the work.

I decided on transcription_version as an attribute name because comments and perhaps subject articles may change independently of the work's transcribed text. A printout that includes commentary needs a comment_version as well as a transcription_version. The two attributes seem orthogonal, because two transcription-only prints of the same work shouldn't appear different because a user has made an unprinted annotation.

Filed Under: progress Tagged With: features

Reader Interactions

Comments

  1. Gavin Robinson says

    July 13, 2007 at 1:50 pm

    You’re right that that kind of approval mechanism is unnecessary for people who are working as scribes on the project. I was thinking more about ways for readers to suggest corrections after the “finished” version is published.

  2. Ben W. Brumfield says

    July 14, 2007 at 3:05 am

    Which was the context of the conversation you posted that suggestion to, come to think of it.

    The notion of a finished version, or even of publishing is still worthy of further investigation. My initial intent was to allow owners to publish a work for viewing once each page was completely transcribed. This didn’t seem to lend itself to encouraging publishing, however, so I toyed with the idea of automatically publishing works that had around 80% of their pages transcribed.

    Even that, however, doesn’t work for some models. A page-a-day model, such is followed by the successful PapasDiary and PepysDiary projects, involves careful attention to a single page at a time, without regard to the completness of the work itself. So I’m not quite sure what to do about when works should be displayed.

Primary Sidebar

What’s Trending on The FromThePage Blog

  • How to Handle Racial or Ethnic Slurs &…
  • Guide to Digitizing Your Archives
  • An Interview with Jodi Hoover of Digital Maryland
  • Privacy And Copyright Considerations Using GPT Models
  • Classifying the Mistakes We Make When We Transcribe
  • An Interview with Michael Lapides of the New Bedford…

Recent Client Interviews

An Interview with Candice Cloud of Stephen F. Austin State University

An Interview with Shanna Raines of the Greenville County Library System

An Interview with Jodi Hoover of Digital Maryland

An Interview with Michael Lapides of the New Bedford Whaling Museum

An Interview with NC State University Libraries

Read More

ai artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata ocr paleography podcast racism Ryan White spreadsheet transcription transcription transcription software

Copyright © 2025 · Magazine Pro on Genesis Framework · WordPress · Log in

Want more content like this?  We publish a newsletter with interesting thought pieces on transcripion and AI for archives once a month.


By signing up, you agree to our Privacy Policy and Terms of Service. We may send you occasional newsletters and promotional emails about our products and services. You can opt-out at any time.  We never sell your information.