• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Interviews
  • crowdsourcing
  • how-to
  • Back to FromThePage
  • Collections

Feature: Subject Indexes

April 30, 2007 by Ben Brumfield

One of the main goals of the diary digitization project is to create printed versions of Julia Brumfield's diaries to distribute to family with little internet access. The links between manuscript pages and articles described below provide the raw material for a print index. Reading storing page-to-article links in RDBMS record makes indexing almost, but not quite trivial.

The "not quite" part comes from possible need for index sub-topics. Linking every instance of "tobacco" is not terribly useful in a diary of life on a tobacco farm. It would still be nice to group planting, stripping, or barn-building under "Tobacco" in a printed index, however.

One way to accomplish this is to categorize articles by topic. This would provide an optional overlay to the article-based indexing, so that the printed index could list all references to each article, plus references to articles for all topics. If Feb 15th's "planting tobacco" was linked to an article on "Planting", but that Planting article was categorized under "Tobacco", the algorithm to generate a printed index could list the link in both places:
Planting: Feb. 15
Tobacco: Planting: Feb 15.

Pseudocode to do this:

indexable_subjects = merge all category titles with all article titles
for each subject in indexable_subjects {
print the subject title
find any article with title == subject
for each link from a page to that article {
print the page in a reference format
}
find any category with title == subject
for each article in that category
print the article title
for each link from a page to that article {
print the page in a reference format
}
}
}

Using an N:N categorization scheme would make this very flexible, indeed. I'd probably still want to present the auto-generated index to the user for review and cleanup before printing. Since this is my only mechanism for classification, some categories ("Personal Names", "Places") could be broken out into their own separate indexes.

Filed Under: subject links Tagged With: features

Primary Sidebar

What’s Trending on The FromThePage Blog

  • How to Handle Racial or Ethnic Slurs &…
  • An Interview with David Caldwell of The New England…
  • OCR Correction vs Transcription
  • What is Meaningful Work?
  • Rails: A Short Introduction to before_filter
  • Start Reading Old Handwriting: Some Recommended Books

Recent Client Interviews

An Interview with Candice Cloud of Stephen F. Austin State University

An Interview with Shanna Raines of the Greenville County Library System

An Interview with Jodi Hoover of Digital Maryland

An Interview with Michael Lapides of the New Bedford Whaling Museum

An Interview with NC State University Libraries

Read More

ai artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata newsletter ocr paleography podcast racism Ryan White spreadsheet transcription transcription transcription software

Copyright © 2026 · Magazine Pro on Genesis Framework · WordPress · Log in

Want more content like this?  We publish a newsletter with interesting thought pieces on transcripion and AI for archives once a month.


By signing up, you agree to our Privacy Policy and Terms of Service. We may send you occasional newsletters and promotional emails about our products and services. You can opt-out at any time.  We never sell your information.