• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Interviews
  • crowdsourcing
  • how-to
  • Back to FromThePage
  • Collections

Feature: Related Pages

December 22, 2009 by Ben Brumfield

I've been thinking a lot about page-to-subject links lately as I edit and annotate Julia Brumfield's 1921 diary. While I've been able to exploit the links data structure in editing, printing, analyzing and displaying the texts, I really haven't viewed it as a way to navigate from one manuscript page to another. In fact, the linkages I've made between pages have been pretty boring -- next/previous page links and a table of contents are the limit. I'm using the page-to-subject links to connect subjects to each other, so why not pages?

The obvious answer is that the subjects which page A would have most in common with page B are the same ones it would have in common with nearly every other page in the collection. In the corpus I'm working with, the diarist mentions her son and daughter-in-law in 95% of pages, for the simple reason that she lives with them. If I choose two pages at random, I find that March 12, 1921 and August 12, 1919 both contain Ben and Jim doing agricultural work, Josie doing domestic work, and Julia's near-daily visit to Marvin's. The two pages are connected through those four subjects (as well as this similarly-disappointing "dinner"), but not in a way that is at all meaningful. So I decided that a page-to-page relatedness tool couldn't be built from the page-to-subject link data.

All that changed two weeks ago, when I was editing the 1921 diary and came across the mention of a "musick box". In trying to figure out whether or not Julia was referring to a phonograph by the term, I discovered that the string "musick box" occurred only two times: when the phonograph was ordered and the first time Julia heard it played. Each one of these mentions shed so much light on the other that I was forced to re-evaluate how pages are connected through subjects. In particular, I was reminded of the "you and one other" recommendations that LibraryThing offers. This is a feature that find other users with whom you share an obscure book. In this case, obscurity is defined as the book occurring only twice in the system: once in your library, once in the other user's.

This would be a relatively easy feature to implement in FromThePage. When displaying a page, perform this algorithm:

  • For each subject link in the page, calculate how many times it is referenced within the collection, then
  • Sort those subjects by reference count, and
  • Take the 3 or 4 subject links with the lowest reference count and,
  • Display the pages which link to those subjects.

For a really useful experience, I'd want to display keyword-in-context, showing a few words to explain the context in which that other occurrence of "musick box" appears.

Filed Under: Uncategorized Tagged With: features

Primary Sidebar

What’s Trending on The FromThePage Blog

  • 10 Ways AI Will Change Archives
  • Guide to Digitizing Your Archives
  • How LLMs Work & A Handwritten Text Recognition Sandbox
  • Learn to Decipher Old Handwriting with Online and…
  • An Interview with Joseph Riedel of Fort Worth Public Library
  • Push vs Pull Crowdsourcing

Recent Client Interviews

An Interview with Candice Cloud of Stephen F. Austin State University

An Interview with Shanna Raines of the Greenville County Library System

An Interview with Jodi Hoover of Digital Maryland

An Interview with Michael Lapides of the New Bedford Whaling Museum

An Interview with NC State University Libraries

Read More

ai artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata newsletter ocr paleography podcast racism Ryan White spreadsheet transcription transcription transcription software

Copyright © 2025 · Magazine Pro on Genesis Framework · WordPress · Log in

Want more content like this?  We publish a newsletter with interesting thought pieces on transcripion and AI for archives once a month.


By signing up, you agree to our Privacy Policy and Terms of Service. We may send you occasional newsletters and promotional emails about our products and services. You can opt-out at any time.  We never sell your information.