• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

about crowdsourcing, manuscript transcription, digital humanities and digital documentary editions

  • Home
  • Project Profiles
  • Interviews with Clients
  • Collections
  • Back to FromThePage

Progress Report: Auto-titling

April 20, 2007 By Ben Brumfield

One of the first things I discovered when I started work on this project is that it's a lot of effort just to get the page images ready to transcribe. In my case, this means rotating each image (either 90 or 270 degrees, depending on recto or verso), shrinking the rotated images by 1/4 to get to the minimum legible size, shrinking the originals again by 1/2 to get to a zoom size, and then attaching titles to them.

This titling is itself quite difficult. While it's trivial to generate consistently-formatted dates to apply to carefully reviewed, consistently named page images, the results of scanning real-world manuscripts are much messier. In my case, I'm taking pictures of the odd numbered pages in order, then following with the even numbered pages. The resulting lists of files have duplicate images of the same pages, are missing pages, or have re-do image pairs in which I discovered my camera was on the wrong setting and had to re-shoot a series of pages. In one diary the titles in the original aren't even sequential, since every two months includes a "Memoranda" page. In another, a separate sheet has been glued in over an earlier diary entry.

The only solution I've come up with is a titling feature, which I completed this month. I upload a set of pictures I believe to be a reasonably-coherent series of pages, choose the proper orientation, and point to the spot on a sample image where the page number is located. This launches a series of background jobs that shrink each image to the minimum legible size, rotate the images correctly, and crops the part of the image that contains the page number. It then automatically generates titles for the images based on user-entered data, then presents the tops of each page along with the generated title for review. The user can delete pages, bump titles into synch with their images, or override titles for cases like my "Memoranda" pages.

The next step is a feature to collate recto and verso image sets, as well as one to fill in skipped page images. The UI for collation is going to be difficult.

Filed Under: history Tagged With: history

Primary Sidebar

What’s Trending on The FromThePage Blog

  • How to Learn to Read Shorthand
  • Interview: Dr. Laura Morreale on Teaching and…
  • Project Profile: Sewanee Project on Slavery, Race…
  • Survey on Crowdsourced Transcription Tools
  • UI and Other Fun Stuff
  • Prosopography Hackathon Project: Using Machine…

Recent Client Interviews

An Interview with Erin Wilson of Ohio University Libraries

An Interview with Susannah Ural of the Civil War & Reconstruction Governors of Mississippi Project

An Interview with Olivia Carlisle of the State Archives of North Carolina

An Interview with Paige Roberts of Phillips Academy Archives & Special Collections

An Interview with Riley Bogran of the Sandy Spring Museum

Privacy Policy | Terms & Conditions | About Us | Contact Us

Copyright © 2021 · FromThePage.com