FromThePage can support a variety of additional file types alongside regular image uploads, including any existing transcripts, Optical Character Recognition (OCR), or Handwritten Text Recognition (HTR) outputs. To upload these files and relate them to the corresponding image, they must be uploaded alongside the corresponding image in the same .ZIP file. Image files may be uploaded without a corresponding text file, but text files must be uploaded with images as they cannot be added later.
FromThePage can process transcription data from a variety of text sources, including .TXT, Alto XML and TEI .XML.
You may upload your zipped folder, as well as any other folders or zipped folders, using the “Start a Project” page located on your FromThePage account dashboard. Check the box labeled “Import Text” when uploading any transcription or OCR files. This process will convert the folders into a FromThePage work with the content of the .TXT or .XML files such as TEI-XML or Alto set as the raw transcription data.
Matching transcription file names with the corresponding image name is required for properly pairing the transcription and the image in the FromThePage software. The following is an example of paired image and text uploads included in a single .ZIP file:
Document Image | Document Transcription |
cover_page.jpg | cover_page.txt |
page_001.jpg | page_001.txt |
page_002.jpg | page_002.txt |
page_003.jpg | page_003.txt |
page_004.jpg | page_004.txt |
From there, you may convert the text into a manuscript transcription work from an OCR correction project so that the nomenclature is changed if appropriate.When uploading texts to FromThePage, the system treats the uploaded text as a “Correction” project. This changes the “Transcribe” tab into a “Correct” tab. To revert to “Transcribe,” un-check the box labeled “Enable OCR Correction” in the Settings tab and click “Save Changes.”
For more information about uploading images, see “Uploading Image Files to FromThePage”.