• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Interviews
  • crowdsourcing
  • how-to
  • Back to FromThePage
  • Collections

Export Formats

FromThePage provides a variety of export formats.  The following list explains what each one is.

Verbatim Plaintext

This plaintext file will represent line breaks with single newline, paragraph breaks with a double newline, and page breaks with a triple newline. It will contain the verbatim text, with all formatting, emendation, and subject linking stripped out.

TEI-XML

This may be useful to editors who plan to do further mark-up within TEI-XML editors like Oxygen.

HTML

This may be useful for preservation in other systems or as a starting point for display on another website.

Emended Plaintext

Like the verbatim plaintext, this plaintext file will represent line breaks with single newline, paragraph breaks with a double newline, and page breaks with a triple newline. It differs from the verbatim text, in that normalization will be applied to all subjects mentioned, so that while the verbatim text may read "I greeted Mr. Jones and his wife this morning.", the emended plaintext will read "I greeted James Jones and Elizabeth Smith Jones this morning". This artificial text is useful for programmatic analysis, but is not meant to be read by humans.

Verbatim Translation Plaintext

Identical to the verbatim plaintext in formatting, but containing the text of the translation (rather than the original language) for works which support translation.

Emended Translation Plaintext

An emended plaintext version (see above) of the translation of the work will be available at this URL. This element is not included for works that do not have translation enabled.

Plaintext for full-text search

A plaintext version of the work optimized for full-text search will be available at this URL. This version contains a verbatim plaintext transcript of each page (as described above), except that words broken by hyphenated newlines are joined together, and a list of the canonical names mentioned within each page is appended to the end of the page. The example text would be rendered as

I greeted Mr. Jones and his wife this morning.

James Jones
Elizabeth Smith Jones

Text PDF

A PDF file containing text transcripts.

Text DOCX

An MS-Word (.docx) file containing text transcripts.

Facing Edition PDF

A PDF file containing images and transcripts on facing pages.

Table/Field CSV

Exports a spreadsheet with field-based or tabular data.

Work Metadata CSV

A spreadsheet listing each work with page counts and metadata.

Static Site

A static Jekyll website containing the entire edition

Subject CSV

The existing CSV export of subjects mentioned within the work.

Primary Sidebar

What’s Trending on The FromThePage Blog

  • Archives as an Antidote for ChatGPT
  • An Interview with Michael Lapides of the New Bedford…
  • How Do I Read Old Handwriting?
  • An Interview with Dr. Camille Westmont of Sewanee:…
  • Learn to Decipher Old Handwriting with Online and…
  • Spreadsheet Transcription in FromThePage

Recent Client Interviews

An Interview with Michael Lapides of the New Bedford Whaling Museum

An Interview with NC State University Libraries

An Interview with Richard Gilreath of the Texas State Library and Archives Commission

An Interview with Julanne Neal of the Queensland State Archives

An Interview with Andrea Meyer of East Hampton Public Library

Read More

artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata ocr paleography podcast Ryan White spreadsheet transcription transcription transcription software
Privacy Policy | Terms & Conditions | About Us | Contact Us

Copyright © 2023 · FromThePage.com