Export Formats - FromThePage Blog

FromThePage provides a variety of export formats. The following list explains what each one is.

Verbatim Plaintext

This plaintext file will represent line breaks with single newline, paragraph breaks with a double newline, and page breaks with a triple newline. It will contain the verbatim text, with all formatting, emendation, and subject linking stripped out.

TEI-XML

This may be useful to editors who plan to do further mark-up within TEI-XML editors like Oxygen.

HTML

This may be useful for preservation in other systems or as a starting point for display on another website.

Emended Plaintext

Like the verbatim plaintext, this plaintext file will represent line breaks with single newline, paragraph breaks with a double newline, and page breaks with a triple newline. It differs from the verbatim text, in that normalization will be applied to all subjects mentioned, so that while the verbatim text may read "I greeted Mr. Jones and his wife this morning.", the emended plaintext will read "I greeted James Jones and Elizabeth Smith Jones this morning". This artificial text is useful for programmatic analysis, but is not meant to be read by humans.

Verbatim Translation Plaintext

Identical to the verbatim plaintext in formatting, but containing the text of the translation (rather than the original language) for works which support translation.

Emended Translation Plaintext

An emended plaintext version (see above) of the translation of the work will be available at this URL. This element is not included for works that do not have translation enabled.

Plaintext for full-text search

A plaintext version of the work optimized for full-text search will be available at this URL. This version contains a verbatim plaintext transcript of each page (as described above), except that words broken by hyphenated newlines are joined together, and a list of the canonical names mentioned within each page is appended to the end of the page. The example text would be rendered as

I greeted Mr. Jones and his wife this morning.

James Jones
Elizabeth Smith Jones

Text PDF

A PDF file containing text transcripts.

Text DOCX

An MS-Word (.docx) file containing text transcripts.

Facing Edition PDF

A PDF file containing images and transcripts on facing pages.

Table/Field CSV

Exports a spreadsheet with field-based or tabular data.

Work Metadata CSV

A spreadsheet listing each work with page counts and metadata.

Static Site

A static Jekyll website containing the entire edition

Subject CSV

The existing CSV export of subjects mentioned within the work.