• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Interviews
  • crowdsourcing
  • how-to
  • Back to FromThePage
  • Collections

Can Multi-Modal LLMs Transcribe Historic Documents?

June 5, 2024 by Ben Brumfield

Recently, both OpenAI and Google released new multi-modal large language models, where were immediately touted for their ability to transcribe documents.

Also last week, I transcribed this document from the Library of Virginia’s collection from the Virginia Revolutionary Conventions. Let’s read it together:

Whereas his Excellency John Earl of Dun-

more Lieutenant and Governor General of this colony

by withdrawing himself from the Seat of

Government and on board one of his Maj the King’s

Ships of War, under pretence by declaring

that he will lay destroy the City of Williamsburg [illegible]

Emancipate our Slaves lay waste Emancipate our Slaves and destroy the country

& by other wicked and hostile d[eeds]

There is this really interesting psychological distancing going on here, where the writer starts to say “his Majesty’s ship”, and changes that to “the King’s ship”.

This may ring a bell for some of you. 

One of the central controversies of the New York Times’ 1619 Project involves the assertion that the Americans declared their independence in reaction to a rising Abolitionist sentiment in Britain, which threatened to end slavery. 

This claim was disputed by socialist and conservative historians, but the discussion revolves around documents like the one we just read: colonists reacting against Lord Dunmore’s proclamation that enslaved and indentured servants joining his forces would be freed.

How do historians discussing this issue find these documents?

Traditionally, they had to be transcribed by humans, which is how I stumbled on this document.

Traditional HTR tools like Transkribus’s English Eagle transformer model–and by “traditional”, I mean transformer technology from 18 months ago–produce output like this. It’s not great – the strikethroughs cause some recognition problems, and the insertions really scramble the reading order of the text.

With the release of ChatGPT4o, we can attempt HTR via a large language model instead of a transformer.

I uploaded the document and asked ChatGPT to transcribe it, using TEI to represent the strike-throughs and insertions.

How do the results compare?

ChatGPT’s results make a lot more sense. But what are they missing?

Where is Lord Dunmore? That text was really faint in the image, so neither system transcribed it. Where is “Emancipate” or “Slaves”? If I’m honest, neither system produces the keywords I would want if I were researching this issue and looking for primary sources. 

But there are hints of the subject in the Transkribus output – “Emaneepate oellaves Endrupare ol Haves” is the kind of thing which would raise red flags if I were reviewing uncorrected text.

More problematic is that the LLM output eliminates any hint of slavery, emancipation, or Lord Dunmore. Yes, it marked passages that had been struck through as if they were not, and mis-read “destroy” as “devour”,  Yes, it swept away that psychological artifact with “his Majesty’s” and “the King’s”. But more importantly, it doesn’t look obviously wrong.

Which of these tools has a better Character Error Rate? I don’t know, and I don’t care! 

The Transkribus output is obviously raw, and in need of correction. It looks tentative when you read it in isolation. The ChatGPT output looks much more plausible, and–in my opinion–that plausibility is treacherous.

LLMs are good at detecting subjects-like mention of slavery–in texts. I asked that question here, and the result speaks for itself.

Filed Under: Uncategorized

Primary Sidebar

What’s Trending on The FromThePage Blog

  • Guide to Digitizing Your Archives
  • 10 Ways AI Will Change Archives
  • Classifying the Mistakes We Make When We Transcribe
  • Privacy And Copyright Considerations Using GPT Models
  • An Interview with Jodi Hoover of Digital Maryland
  • How to Handle Racial or Ethnic Slurs &…

Recent Client Interviews

An Interview with Candice Cloud of Stephen F. Austin State University

An Interview with Shanna Raines of the Greenville County Library System

An Interview with Jodi Hoover of Digital Maryland

An Interview with Michael Lapides of the New Bedford Whaling Museum

An Interview with NC State University Libraries

Read More

ai artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata ocr paleography podcast racism Ryan White spreadsheet transcription transcription transcription software

Copyright © 2025 · Magazine Pro on Genesis Framework · WordPress · Log in

Want more content like this?  We publish a newsletter with interesting thought pieces on transcripion and AI for archives once a month.


By signing up, you agree to our Privacy Policy and Terms of Service. We may send you occasional newsletters and promotional emails about our products and services. You can opt-out at any time.  We never sell your information.