• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Interviews
  • crowdsourcing
  • how-to
  • Back to FromThePage
  • Collections

Using GPT in an Archival Context: Matt Miller’s Susan B. Anthony Papers

October 5, 2023 by Sara Brumfield

Have you seen Matt Miller’s experiments using the GPT APIs on the Library of Congress’ Susan B. Anthony Papers? If you want to understand the possibilities of AI for archives, this is a great place to start. You can read Matt’s write-up and explore his interface to the collection. He’s using the GPT APIs to achieve a lot:

  • Parsing the documents to create structured metadata: date, location, and entry for her daybooks; sender, recipient, and date for her letters.
  • Extracting people and places mentioned. (This is great for finding aids, but also for mapping documents.)
  • Summarizing: Matt actually experiments with four levels of granularity: 1 sentence, 2 sentences, 3 sentences, 4 sentences. This is great for his user interface (or any search or browse output), and would be handy in a finding aid.
  • Semantic searching: Highlight a concept or phrase to see documents related semantically to that phrase. Try it with concepts like “fear” or “scare” to see its power. This is different from keyword search in some interesting ways and is going to take training to teach researchers how to interpret its results.
  • Suggesting similar documents: Using similarity to suggest other, possibly related, documents. I think this is the weakest part of the GPT APIs – I’m curious if you uncover anything useful with this.
  • Acknowledging and showing machine-generated text. Possibly the most important demonstration here is how to indicate machine-generated material in the interface. Matt does it with a pop-up and with a toggle that highlights the machine-generated text.

Filed Under: Uncategorized

Primary Sidebar

What’s Trending on The FromThePage Blog

  • What Machines Can’t Replace: Old Fashioned Human Efforts
  • Detecting Handwriting in OCR Text
  • Classifying the Mistakes We Make When We Transcribe
  • Progress Report: GitHub, Archive.org Integration,…
  • Open Source vs. Open Access
  • Providing Contextual Help to Transcribers in…

Recent Client Interviews

An Interview with Candice Cloud of Stephen F. Austin State University

An Interview with Shanna Raines of the Greenville County Library System

An Interview with Jodi Hoover of Digital Maryland

An Interview with Michael Lapides of the New Bedford Whaling Museum

An Interview with NC State University Libraries

Read More

ai artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata newsletter ocr paleography podcast racism Ryan White spreadsheet transcription transcription transcription software

Copyright © 2025 · Magazine Pro on Genesis Framework · WordPress · Log in

Want more content like this?  We publish a newsletter with interesting thought pieces on transcripion and AI for archives once a month.


By signing up, you agree to our Privacy Policy and Terms of Service. We may send you occasional newsletters and promotional emails about our products and services. You can opt-out at any time.  We never sell your information.