• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

about crowdsourcing, manuscript transcription, digital humanities and digital documentary editions

  • Home
  • Project Profiles
  • Interviews with Clients
  • Collections
  • Back to FromThePage

Export Formats

FromThePage provides a variety of export formats.  The following list explains what each one is.

Verbatim Plaintext

This plaintext file will represent line breaks with single newline, paragraph breaks with a double newline, and page breaks with a triple newline. It will contain the verbatim text, with all formatting, emendation, and subject linking stripped out.

TEI-XML

This may be useful to editors who plan to do further mark-up within TEI-XML editors like Oxygen.

HTML

This may be useful for preservation in other systems or as a starting point for display on another website.

Emended Plaintext

Like the verbatim plaintext, this plaintext file will represent line breaks with single newline, paragraph breaks with a double newline, and page breaks with a triple newline. It differs from the verbatim text, in that normalization will be applied to all subjects mentioned, so that while the verbatim text may read "I greeted Mr. Jones and his wife this morning.", the emended plaintext will read "I greeted James Jones and Elizabeth Smith Jones this morning". This artificial text is useful for programmatic analysis, but is not meant to be read by humans.

Verbatim Translation Plaintext

Identical to the verbatim plaintext in formatting, but containing the text of the translation (rather than the original language) for works which support translation.

Emended Translation Plaintext

An emended plaintext version (see above) of the translation of the work will be available at this URL. This element is not included for works that do not have translation enabled.

Plaintext for full-text search

A plaintext version of the work optimized for full-text search will be available at this URL. This version contains a verbatim plaintext transcript of each page (as described above), except that words broken by hyphenated newlines are joined together, and a list of the canonical names mentioned within each page is appended to the end of the page. The example text would be rendered as

I greeted Mr. Jones and his wife this morning.

James Jones
Elizabeth Smith Jones

Subject CSV

The existing CSV export of subjects mentioned within the work.

Primary Sidebar

What’s Trending on The FromThePage Blog

  • An Interview with Riley Bogran of the Sandy Spring Museum
  • 2018 Paleography Courses
  • How to Learn to Read Shorthand
  • Interview: Dr. Laura Morreale on Teaching and…
  • Rails: acts_as_list Incantations
  • Learn to Decipher Old Handwriting with Online and…

Recent Client Interviews

An Interview with Erin Wilson of Ohio University Libraries

An Interview with Susannah Ural of the Civil War & Reconstruction Governors of Mississippi Project

An Interview with Olivia Carlisle of the State Archives of North Carolina

An Interview with Paige Roberts of Phillips Academy Archives & Special Collections

An Interview with Riley Bogran of the Sandy Spring Museum

Privacy Policy | Terms & Conditions | About Us | Contact Us

Copyright © 2021 · FromThePage.com