• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Interviews
  • crowdsourcing
  • how-to
  • Back to FromThePage
  • Collections

Export Formats

FromThePage provides a variety of export formats.  The following list explains what each one is.

Verbatim Plaintext

This plaintext file will represent line breaks with single newline, paragraph breaks with a double newline, and page breaks with a triple newline. It will contain the verbatim text, with all formatting, emendation, and subject linking stripped out.

TEI-XML

This may be useful to editors who plan to do further mark-up within TEI-XML editors like Oxygen.

HTML

This may be useful for preservation in other systems or as a starting point for display on another website.

Emended Plaintext

Like the verbatim plaintext, this plaintext file will represent line breaks with single newline, paragraph breaks with a double newline, and page breaks with a triple newline. It differs from the verbatim text, in that normalization will be applied to all subjects mentioned, so that while the verbatim text may read "I greeted Mr. Jones and his wife this morning.", the emended plaintext will read "I greeted James Jones and Elizabeth Smith Jones this morning". This artificial text is useful for programmatic analysis, but is not meant to be read by humans.

Verbatim Translation Plaintext

Identical to the verbatim plaintext in formatting, but containing the text of the translation (rather than the original language) for works which support translation.

Emended Translation Plaintext

An emended plaintext version (see above) of the translation of the work will be available at this URL. This element is not included for works that do not have translation enabled.

Plaintext for full-text search

A plaintext version of the work optimized for full-text search will be available at this URL. This version contains a verbatim plaintext transcript of each page (as described above), except that words broken by hyphenated newlines are joined together, and a list of the canonical names mentioned within each page is appended to the end of the page. The example text would be rendered as

I greeted Mr. Jones and his wife this morning.

James Jones
Elizabeth Smith Jones

Text PDF

A PDF file containing text transcripts.

Text DOCX

An MS-Word (.docx) file containing text transcripts.

Facing Edition PDF

A PDF file containing images and transcripts on facing pages.

Table/Field CSV

Exports a spreadsheet with field-based or tabular data.

Work Metadata CSV

A spreadsheet listing each work with page counts and metadata.

Static Site

A static Jekyll website containing the entire edition

Subject CSV

The existing CSV export of subjects mentioned within the work.

Primary Sidebar

What’s Trending on The FromThePage Blog

  • 10 Ways AI Will Change Archives
  • More Than Round Trip: Using Transcription for…
  • How to Handle Racial or Ethnic Slurs &…
  • An Interview with Michael Lapides of the New Bedford…
  • Measuring Success in Crowdsourcing Projects
  • Start Reading Old Handwriting: Some Recommended Books

Recent Client Interviews

An Interview with Candice Cloud of Stephen F. Austin State University

An Interview with Shanna Raines of the Greenville County Library System

An Interview with Jodi Hoover of Digital Maryland

An Interview with Michael Lapides of the New Bedford Whaling Museum

An Interview with NC State University Libraries

Read More

ai artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata ocr paleography podcast racism Ryan White spreadsheet transcription transcription transcription software

Copyright © 2025 · Magazine Pro on Genesis Framework · WordPress · Log in