• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Interviews
  • crowdsourcing
  • how-to
  • Back to FromThePage
  • Collections

Table Encoding

This describes support for encoding, display, and export of tabular data appearing within documents (requested & funded by Anna Agbe-Davies and the UNC-Chapel Hill Anthropology Department). This feature allows Markdown encoding of tables appearing within documents, and enhances the semantic division of texts into sections.

Example from Volume 2, Notebook 2, Page 45 of the Jeremiah White Graves accounts, bottom:

==Mrs Hendricks Abram==
| Date | Note | Amount |
| ----| -------| ---- |
| Nov 13th 1845 | Cr by 179lb Hay @ 2/3 | 0.65 |
| " " | 53lb fodder @ 3/- | 0.26 |
| " | 1/2 bbl corn @ $3/- | 1.50 |

A table consists of an optional section title, a table header, a separator line, and table data.  These are encoded as follows:

Section titles are surrounded by equals signs, increasing in length (minimum 2, maximum of 6) to indicate greater importance:

==section title==
===more important title===
====even more important title====

Table headings are words or phrases separated by a pipe ("|")sign.  Headings live on a single line, and are followed by heading separators:

| First Column | Second Column | Third Column |

Table heading separators are three or more dashes.  Good form suggests that they also use pipe signs, mirroring the table headers and cells:

| -------- | -------------- | ------------- |

The function of the table heading separators is to differentiate a heading from a data row, and to give an additional hint to the parser that it is processing a table.

Do not leave empty lines between the heading, separator, and data rows.  Leaving empty lines between the heading, separator, and data rows will result in the cell sizes not matching the rest of the table.

Table data rows are separated by pipes, just as table headings are.  They differ from headings only by occurring after a separator:

| A particular date | Some stuff I sold | A subtotal |
| A later date | Some other stuff | A different subtotal |

If there is no data in a cell, put the table separators with spaces in between. If the last cell in a row does not have data, make sure to put an ending separator after the placeholder spaces.
When a page transcript is saved including tabular encoding, the sections, headings, and data cells are recorded in the database, and the internal encoding used for transcript mark-up is updated with tabular mark-up.  The page should then be displayed to researchers with tables appearing in the transcriptions as HTML tables.  The Export feature has also been modified to extract all tables from a work into a single CSV file, containing the contents of all tables within a work.  This is a "sparse" table, so that spreadsheet columns will exist for each heading within the work, while those table rows which do not contain an entry for a particular heading will display a blank cell.  If many different tables in a text include an "Amount" cell, and if that Amount cell is labeled as "Amount" in the table heading, all of those tables will have the appropriate value for Amount in the exported spreadsheet.

Primary Sidebar

What’s Trending on The FromThePage Blog

  • Guide to Digitizing Your Archives
  • An Interview with Jodi Hoover of Digital Maryland
  • Privacy And Copyright Considerations Using GPT Models
  • Classifying the Mistakes We Make When We Transcribe
  • How to Handle Racial or Ethnic Slurs &…
  • Project Profile: University of Virginia School of…

Recent Client Interviews

An Interview with Candice Cloud of Stephen F. Austin State University

An Interview with Shanna Raines of the Greenville County Library System

An Interview with Jodi Hoover of Digital Maryland

An Interview with Michael Lapides of the New Bedford Whaling Museum

An Interview with NC State University Libraries

Read More

ai artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata ocr paleography podcast racism Ryan White spreadsheet transcription transcription transcription software

Copyright © 2025 · Magazine Pro on Genesis Framework · WordPress · Log in