• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Interviews
  • crowdsourcing
  • how-to
  • Back to FromThePage
  • Collections

Uploading Transcriptions or OCR Transcripts with Page Images

FromThePage can support a variety of additional file types alongside regular image uploads, including any existing transcripts, Optical Character Recognition (OCR), or Handwritten Text Recognition (HTR) outputs. To upload these files and relate them to the corresponding image, they must be uploaded alongside the corresponding image in the same .ZIP file. Image files may be uploaded without a corresponding text file, but text files must be uploaded with images as they cannot be added later. 

 FromThePage can process transcription data from a variety of text sources, including .TXT, Alto XML and TEI .XML.
You may upload your zipped folder, as well as any other folders or zipped folders, using the “Start a Project” page located on your FromThePage account dashboard. Check the box labeled “Import Text” when uploading any transcription or OCR files. This process will convert the folders into a FromThePage work with the content of the .TXT or .XML files such as TEI-XML or Alto set as the raw transcription data.

Matching transcription file names with the corresponding image name is required for properly pairing the transcription and the image in the FromThePage software. The following is an example of paired image and text uploads included in a single .ZIP file:

Document ImageDocument Transcription
cover_page.jpgcover_page.txt
page_001.jpgpage_001.txt
page_002.jpgpage_002.txt
page_003.jpgpage_003.txt
page_004.jpgpage_004.txt

From there, you may convert the text into a manuscript transcription work from an OCR correction project so that the nomenclature is changed if appropriate.When uploading texts to FromThePage, the system treats the uploaded text as a “Correction” project. This changes the “Transcribe” tab into a “Correct” tab. To revert to “Transcribe,” un-check the box labeled “Enable OCR Correction” in the Settings tab and click “Save Changes.”  

For more information about uploading images, see “Uploading Image Files to FromThePage”.

Primary Sidebar

What’s Trending on The FromThePage Blog

  • Guide to Digitizing Your Archives
  • Privacy And Copyright Considerations Using GPT Models
  • An Interview with Jodi Hoover of Digital Maryland
  • Classifying the Mistakes We Make When We Transcribe
  • Project Profile: University of Virginia School of…
  • How to Handle Racial or Ethnic Slurs &…

Recent Client Interviews

An Interview with Candice Cloud of Stephen F. Austin State University

An Interview with Shanna Raines of the Greenville County Library System

An Interview with Jodi Hoover of Digital Maryland

An Interview with Michael Lapides of the New Bedford Whaling Museum

An Interview with NC State University Libraries

Read More

ai artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata ocr paleography podcast racism Ryan White spreadsheet transcription transcription transcription software

Copyright © 2025 · Magazine Pro on Genesis Framework · WordPress · Log in