• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Interviews
  • crowdsourcing
  • how-to
  • Back to FromThePage
  • Collections

Uploading Transcriptions or OCR Transcripts with Page Images

FromThePage can support a variety of additional file types alongside regular image uploads, including any existing transcripts, Optical Character Recognition (OCR), or Handwritten Text Recognition (HTR) outputs. To upload these files and relate them to the corresponding image, they must be uploaded alongside the corresponding image in the same .ZIP file. Image files may be uploaded without a corresponding text file, but text files must be uploaded with images as they cannot be added later. 

 FromThePage can process transcription data from a variety of text sources, including .TXT, and TEI .XML.
You may upload your zipped folder, as well as any other folders or zipped folders, using the “Start a Project” page located on your FromThePage account dashboard. Check the box labeled “Import Text” when uploading any transcription or OCR files. This process will convert the folders into a FromThePage work with the content of the .TXT or .XML files such as TEI-XML or HTML set as the raw transcription data (uploading ALTO or DjVu(sp) XML files is not supported).

Matching transcription file names with the corresponding image name is required for properly pairing the transcription and the image in the FromThePage software. The following is an example of paired image and text uploads included in a single .ZIP file:

Document ImageDocument Transcription
cover_page.jpgcover_page.txt
page_001.jpgpage_001.txt
page_002.jpgpage_002.txt
page_003.jpgpage_003.txt
page_004.jpgpage_004.txt

From there, you may convert the text into a manuscript transcription work from an OCR correction project so that the nomenclature is changed if appropriate.When uploading texts to FromThePage, the system treats the uploaded text as a “Correction” project. This changes the “Transcribe” tab into a “Correct” tab. To revert to “Transcribe,” un-check the box labeled “Enable OCR Correction” in the Settings tab and click “Save Changes.”  

For more information about uploading images, see “Uploading Image Files to FromThePage”.

Primary Sidebar

What’s Trending on The FromThePage Blog

  • What to Do When Your Transcribers Can’t Read Cursive
  • Teaching With Primary Sources on FromThePage
  • Start Reading Old Handwriting: Some Recommended Books
  • Resources for Reading Old Handwriting
  • Classifying the Mistakes We Make When We Transcribe
  • An Interview with Rebecca Dillmeier of the United…

Recent Client Interviews

An Interview with Richard Gilreath of the Texas State Library and Archives Commission

An Interview with Julanne Neal of the Queensland State Archives

An Interview with Andrea Meyer of East Hampton Public Library

An Interview with Keith Mitchell of The National Archives (UK)

An Interview with Olivia Carlisle of the State Archives of North Carolina

Read More

artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata ocr paleography podcast Ryan White spreadsheet transcription transcription transcription software
Privacy Policy | Terms & Conditions | About Us | Contact Us

Copyright © 2022 · FromThePage.com