• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Interviews
  • crowdsourcing
  • how-to
  • Back to FromThePage
  • Collections

Uploading existing transcriptions or OCR with Page Images

It is possible to import existing transcripts in the zip file upload.
First, create a folder with image files in it.  Then make sure that each image file has a .txt or .xml file containing the transcript of that page, following the same name conventions as the image, like in this example:
envelope.jpg
envelope.txt
page_001.jpg
page_001.txt
page_002.jpg
page_003.txt
postmark.JPG
postmark.txt
Not all image files need corresponding text files, but the filenames do need to be identical (except for the extension) when there are text files with transcripts.
Create a metadata.yml file if you wish, and place it in the same folder.
Then zip up the folder (along with other folders, if you want), and upload it to the Start a Project screen.  Make sure to check the "import text" box.
The folders should be converted into a FromThePage work with the contents of the text or xml files set as the raw OCR text.  You'll probably want to convert the work into a manuscript transcription work from an OCR correction project (using the checkbox on the collection settings page) so that the nomenclature is changed appropriately.

Primary Sidebar

What’s Trending on The FromThePage Blog

  • Guide to Digitizing Your Archives
  • How to Handle Racial or Ethnic Slurs &…
  • An Interview with Keith Mitchell of The National…
  • 10 Ways AI Will Change Archives
  • Project Profile: Stanford University Archives
  • An Interview with Rebecca Dillmeier of the United…

Recent Client Interviews

An Interview with Candice Cloud of Stephen F. Austin State University

An Interview with Shanna Raines of the Greenville County Library System

An Interview with Jodi Hoover of Digital Maryland

An Interview with Michael Lapides of the New Bedford Whaling Museum

An Interview with NC State University Libraries

Read More

ai artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata ocr paleography podcast racism Ryan White spreadsheet transcription transcription transcription software

Copyright © 2025 · Magazine Pro on Genesis Framework · WordPress · Log in