We’ve spent the last 6 months working with our AI Assist Development Partners using their large variety of archival documents to gather feedback on both our AI Assist and AI Draft features, as well as what kinds of documents are good candidates for handwritten text recognition using Transkribus’ super models. Today, we’re sharing some of our interesting findings on the second. This is a long one, but we think you’ll find it interesting.
What Worked
“Very difficult” handwriting
Humans can read handwriting like this, but it takes a lot of effort and experience. HTR did a great job.
Bleed through
Ink that bleeds through the page makes the front of the page hard to read. HTR didn’t have any problem identifying the “main” text on the page and ignoring the bleed through.
What Didn’t Work
Cross Hatched Writing
Because HTR services first do segmentation – and expect text to be linear – cross writing or cross hatched writing both wasn’t transcribed and caused problems with the horizontally oriented writing.
Old, Faded, Damaged Documents
This is in-between – it half worked. I think it’s about as good as a human could do.
Text in Pencil
This was a surprise – when you have text written in pencil, especially when there is inked text on the same page – the results were particularly poor.
Text in Red Pen
"Like the pencil, but even worse, was the text written in red pen."
Text Written in Between Lines
In this case multiple lines are squeezed into the end of a line, and the HTR service only picked up some of them.