Feature: Subject Indexes - FromThePage Blog

One of the main goals of the diary digitization project is to create printed versions of Julia Brumfield's diaries to distribute to family with little internet access. The links between manuscript pages and articles described below provide the raw material for a print index. Reading storing page-to-article links in RDBMS record makes indexing almost, but not quite trivial.

The "not quite" part comes from possible need for index sub-topics. Linking every instance of "tobacco" is not terribly useful in a diary of life on a tobacco farm. It would still be nice to group planting, stripping, or barn-building under "Tobacco" in a printed index, however.

One way to accomplish this is to categorize articles by topic. This would provide an optional overlay to the article-based indexing, so that the printed index could list all references to each article, plus references to articles for all topics. If Feb 15th's "planting tobacco" was linked to an article on "Planting", but that Planting article was categorized under "Tobacco", the algorithm to generate a printed index could list the link in both places:
Planting: Feb. 15
Tobacco: Planting: Feb 15.

Pseudocode to do this:
indexable_subjects = merge all category titles with all article titles for each subject in indexable_subjects { print the subject title find any article with title == subject for each link from a page to that article { print the page in a reference format } find any category with title == subject for each article in that category print the article title for each link from a page to that article { print the page in a reference format } } }

Using an N:N categorization scheme would make this very flexible, indeed. I'd probably still want to present the auto-generated index to the user for review and cleanup before printing. Since this is my only mechanism for classification, some categories ("Personal Names", "Places") could be broken out into their own separate indexes.