Transcriptions, Screen Readers, and ChatGPT

Usually, when we transcribe documents, we type what we see, retaining irregular spelling and punctuation. But does this practice–a scholarly standard–serve everyone? I wondered how screen readers deal with verbatim transcriptions, so I ran an experiment. I used Microsoft Narrator, the screen reader built into Windows, to read passages aloud from a 19th-century farm diary, a 17th-century beer recipe, and a part of the 15th-century Prose Alexander.

To my surprise, the 17th-century brewing recipe was the clearest to understand. Although it did pronounce the final "e" in "worke" and did not modernize "yest" to "yeast," the context made the meaning clear. At worst, the pronunciation of "u" for modern "v" in words like "uery" or "euening" was a bit distracting.

The Middle English text was also fairly reasonable. Narrator did a decent job pronouncing words like "swilke." However, some spellings that a reader might modernize were vocalized literally in ways that confused me. For example, "Whi, ame I thi son?" was pronounced "Whee, amey I thee son?" which I heard as "Whee, am I the sun?".

The worst text was the 19th-century farm diary. Narrator spelled out abbreviations like "hhds," instead of expanding them to “hogsheads”, but the terse wording of the entries was the biggest problem. The entry dates were spoken as numbers dropped into the text seemingly at random, and any passage in which the diarist did not write complete sentences sounded like a jumble of words.

This experiment showed that comprehensibility depends more on the nature of the text than on the age of the language or irregularity of the spelling. So how can we make such texts accessible?

Paraphrasing is one of the things ChatGPT is supposed to be good at, so I ran the farm diary through it to see if there was any improvement:

The paraphrase has some serious problems: ”balance” is misinterpreted, “On the 26th there was a fine rain yesterday” is awkward, and the last entry is missing “grows” (for “gross”) and replaces the offensive “negroes” with “slaves”. But does the paraphrase cause more problems for a screen reader user than the verbatim text?

After my experience using Narrator on the farm diary, I have to say that the paraphrase is an improvement despite the errors introduced by ChatGPT. No human reading “hhds” aloud would say “H. H. D. S.” unless they had never encountered the term, so doing so is an error. Stringing together passages separated by line breaks–another screen reader behavior on verbatim text–also presents a massive barrier to readers of the text.

It’s important to note that the AI paraphrase was not able to help in passages that require special historical knowledge. Without annotation by a scholarly editor, the financial transaction involving Stephen Clement remains as opaque in the AI version as in the original.

So I really think that AI might be a real help for making verbatim texts more accessible to the blind and visually impaired. It’s not as good as the normalized version a well-funded scholarly edition might present as an optional format, but it’s a lot more cost-effective. Using AI for increasing access to our texts (rather than cheating on essays or spamming people) gets me excited.