The 2006 Family History Technology Workshop archives are online. One presentation ("Towards Searchable Indexes for Handwritten Documents") dealt with the difficulties of automating OCR. The conclusion: it's not impossible to pragmatically digitize manuscripts for the purpose of searching. Partial matches between search terms and recognized manuscript letters mean that so … [Read more...] about Paper: Computational Manuscript Indexing
Archives for April 2007
Planning for GA
Sara and I had a discussion last night over supper, hashing out the business plan for the project. The most important conclusion was that I need to use the app with a couple of small user communities before any sort of general release. That helps me focus my efforts on some very specific features. Get a start-to-finish set of transcription features and the basics for … [Read more...] about Planning for GA
Feature: Regularization
One of the many editorial decisions that must be made while transcribing a manuscript is whether or not to preserve the document's original spellling and punctuation. Happily, TEI has a mechanism for preserving preserve both versions while typing the transcript, so the choice of which one to display is delegated to the reader/printer. Unhappily, the eierlegende wollmilchsau … [Read more...] about Feature: Regularization
What I'm Building
I'm working on a piece of software for collaborative manuscript transcription and annotation. That's a bit of a mouthful, but what it boils down is this: I've got temporary access to several family documents which I am trying to transcribe and distribute. Being a software engineer by trade, it seems to me that the easiest way to do this is to write a system that allows me and … [Read more...] about What I'm Building