• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

about crowdsourcing, manuscript transcription, digital humanities and digital documentary editions

  • Home
  • Project Profiles
  • Interviews with Clients
  • Collections
  • Back to FromThePage

Feature: Subject Indexes

April 30, 2007 By Ben Brumfield

One of the main goals of the diary digitization project is to create printed versions of Julia Brumfield's diaries to distribute to family with little internet access. The links between manuscript pages and articles described below provide the raw material for a print index. Reading storing page-to-article links in RDBMS record makes indexing almost, but not quite trivial.

The "not quite" part comes from possible need for index sub-topics. Linking every instance of "tobacco" is not terribly useful in a diary of life on a tobacco farm. It would still be nice to group planting, stripping, or barn-building under "Tobacco" in a printed index, however.

One way to accomplish this is to categorize articles by topic. This would provide an optional overlay to the article-based indexing, so that the printed index could list all references to each article, plus references to articles for all topics. If Feb 15th's "planting tobacco" was linked to an article on "Planting", but that Planting article was categorized under "Tobacco", the algorithm to generate a printed index could list the link in both places:
Planting: Feb. 15
Tobacco: Planting: Feb 15.

Pseudocode to do this:

indexable_subjects = merge all category titles with all article titles
for each subject in indexable_subjects {
print the subject title
find any article with title == subject
for each link from a page to that article {
print the page in a reference format
}
find any category with title == subject
for each article in that category
print the article title
for each link from a page to that article {
print the page in a reference format
}
}
}

Using an N:N categorization scheme would make this very flexible, indeed. I'd probably still want to present the auto-generated index to the user for review and cleanup before printing. Since this is my only mechanism for classification, some categories ("Personal Names", "Places") could be broken out into their own separate indexes.

Filed Under: features, subject links

Primary Sidebar

What’s Trending on The FromThePage Blog

  • How to Learn to Read Shorthand
  • Project Profile: Sewanee Project on Slavery, Race…
  • Interview: Dr. Laura Morreale on Teaching and…
  • Survey on Crowdsourced Transcription Tools
  • UI and Other Fun Stuff
  • Prosopography Hackathon Project: Using Machine…

Recent Client Interviews

An Interview with Erin Wilson of Ohio University Libraries

An Interview with Susannah Ural of the Civil War & Reconstruction Governors of Mississippi Project

An Interview with Olivia Carlisle of the State Archives of North Carolina

An Interview with Paige Roberts of Phillips Academy Archives & Special Collections

An Interview with Riley Bogran of the Sandy Spring Museum

Privacy Policy | Terms & Conditions | About Us | Contact Us

Copyright © 2021 · FromThePage.com