• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Interviews
  • crowdsourcing
  • how-to
  • Back to FromThePage
  • Collections

All the Things: Useful Metadata from AI

May 14, 2024 by Sara Brumfield

Ben and I have been brainstorming things that could be done with AI tools once you have transcriptions – crowdsourced or HTR’d – of documents. There’s a LOT! I thought it would be fun to share that list with you, to spark ideas and possibilities. Reply and let me know what ideas you have and what we missed.

  • Traditional metadata activities
    • Language detection
    • Summarization
    • Assigning subject headings
    • Detecting sender / recipient
    • Detecting place of composition or place sent to
    • Determining the place the document is about
    • Detecting document type
  • Traditional metadata data
    • Create finding aids
    • Associate with EAC-CPF 
    • Associate with SNAC records
    • Associate with Library of Congress Subject Headings
    • Linking to wikidata and other linked open data authorities
  • Non-traditional metadata activities
    • Extracting entities (including families, companies, or other non-person entities)
    • Identifying relationships between mentioned entities
    • Detecting slavery, conflict, lists of names, etc. (This is what I think of as “Answering questions about a page or document”)
    • Emotional valences (positive or negative; strongly or weakly emotive) This would be fun to graph as a discovery interface for documents.
  • Computer Vision-y features
    • Classifying documents as text vs. handwritten
    • Detecting handwriting on printed/typed pages
    • Determining whether pages contain photos or diagrams
    • Getting rid of “Queen Victoria’s Birthday” (or other text preprinted on diary pages)
    • Identifying blank pages
    • Identifying meaningful pages
  • Long-form derivatives
    • Text optimized for screen readers (modernized punctuation and spelling, expanded abbreviations)
    • Translations
    • Modernized spelling for full-text search.
    • “Explain it Like I’m Five” versions (and other “junior reader” versions)
    • Audio version of the text
    • Images or video inspired by/to go along with the text

Yes, some of these are crazy – but many of them are not. And I think they’re all within reach in the coming years.

Filed Under: Uncategorized

Primary Sidebar

What’s Trending on The FromThePage Blog

  • 10 Ways AI Will Change Archives
  • Guide to Digitizing Your Archives
  • How LLMs Work & A Handwritten Text Recognition Sandbox
  • Learn to Decipher Old Handwriting with Online and…
  • The Possibilities of Time
  • AI-Assist in FromThePage: Using HTR in…

Recent Client Interviews

An Interview with Candice Cloud of Stephen F. Austin State University

An Interview with Shanna Raines of the Greenville County Library System

An Interview with Jodi Hoover of Digital Maryland

An Interview with Michael Lapides of the New Bedford Whaling Museum

An Interview with NC State University Libraries

Read More

ai artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata newsletter ocr paleography podcast racism Ryan White spreadsheet transcription transcription transcription software

Copyright © 2025 · Magazine Pro on Genesis Framework · WordPress · Log in

Want more content like this?  We publish a newsletter with interesting thought pieces on transcripion and AI for archives once a month.


By signing up, you agree to our Privacy Policy and Terms of Service. We may send you occasional newsletters and promotional emails about our products and services. You can opt-out at any time.  We never sell your information.