• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Interviews
  • crowdsourcing
  • how-to
  • Back to FromThePage
  • Collections

Uploading existing transcriptions or OCR with Page Images

It is possible to import existing transcripts in the zip file upload.
First, create a folder with image files in it.  Then make sure that each image file has a .txt or .xml file containing the transcript of that page, following the same name conventions as the image, like in this example:
envelope.jpg
envelope.txt
page_001.jpg
page_001.txt
page_002.jpg
page_003.txt
postmark.JPG
postmark.txt
Not all image files need corresponding text files, but the filenames do need to be identical (except for the extension) when there are text files with transcripts.
Create a metadata.yml file if you wish, and place it in the same folder.
Then zip up the folder (along with other folders, if you want), and upload it to the Start a Project screen.  Make sure to check the "import text" box.
The folders should be converted into a FromThePage work with the contents of the text or xml files set as the raw OCR text.  You'll probably want to convert the work into a manuscript transcription work from an OCR correction project (using the checkbox on the collection settings page) so that the nomenclature is changed appropriately.

Primary Sidebar

What’s Trending on The FromThePage Blog

  • Archives as an Antidote for ChatGPT
  • An Interview with Michael Lapides of the New Bedford…
  • How Do I Read Old Handwriting?
  • An Interview with Dr. Camille Westmont of Sewanee:…
  • Learn to Decipher Old Handwriting with Online and…
  • Spreadsheet Transcription in FromThePage

Recent Client Interviews

An Interview with Michael Lapides of the New Bedford Whaling Museum

An Interview with NC State University Libraries

An Interview with Richard Gilreath of the Texas State Library and Archives Commission

An Interview with Julanne Neal of the Queensland State Archives

An Interview with Andrea Meyer of East Hampton Public Library

Read More

artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata ocr paleography podcast Ryan White spreadsheet transcription transcription transcription software
Privacy Policy | Terms & Conditions | About Us | Contact Us

Copyright © 2023 · FromThePage.com