• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Interviews
  • crowdsourcing
  • how-to
  • Back to FromThePage
  • Collections

Feature: Regularization

April 2, 2007 by Ben Brumfield

One of the many editorial decisions that must be made while transcribing a manuscript is whether or not to preserve the document's original spellling and punctuation. Happily, TEI has a mechanism for preserving preserve both versions while typing the transcript, so the choice of which one to display is delegated to the reader/printer. Unhappily, the eierlegende wollmilchsau approach of TEI means their mechanism is pretty hokey:

  • <reg> (regularization) contains a reading which has been regularized or normalized in some sense.
  • <orig> (original form) contains a reading which is marked as following the original, rather than being normalized or corrected.
  • <choice> groups a number of alternative encodings for the same point in a text.

The reason they've made <reg> and <orig> freestanding elements is that they want to be able to show a word as having been corrected without providing an alternative, the same way that one uses sic. This is perfectly reasonable, though I do not think it applies to my application. Less defensible is their choice of <choice> to enclose orig/reg elements. <choice> is used elsewhere to encode variant readings encoded with the <unclear> tag. As a result, any XSL transform attempting to normalize (or originalize) a TEI-encoded document is stuck peeking within every <choice> element it encounters to search for the <reg>/<orig> pair.

Since my transcription source will have to use a different, per-page DTD, I'll probably create an <irreg> tag to use instead of <choice> here.

Filed Under: Uncategorized Tagged With: features

Primary Sidebar

What’s Trending on The FromThePage Blog

  • The Stagville Accounts in DEPCHA: Plantation…
  • Guide to Digitizing Your Archives
  • Facebook versus Twitter for Crowdsourcing Document…
  • How Do I Read Old Handwriting?
  • How Good HTR is Changing What & How We’re Transcribing
  • Your First Crowdsourcing Project

Recent Client Interviews

An Interview with Candice Cloud of Stephen F. Austin State University

An Interview with Shanna Raines of the Greenville County Library System

An Interview with Jodi Hoover of Digital Maryland

An Interview with Michael Lapides of the New Bedford Whaling Museum

An Interview with NC State University Libraries

Read More

ai artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata newsletter ocr paleography podcast racism Ryan White spreadsheet transcription transcription transcription software

Copyright © 2026 · Magazine Pro on Genesis Framework · WordPress · Log in

Want more content like this?  We publish a newsletter with interesting thought pieces on transcripion and AI for archives once a month.


By signing up, you agree to our Privacy Policy and Terms of Service. We may send you occasional newsletters and promotional emails about our products and services. You can opt-out at any time.  We never sell your information.