This is a response to the recently published "A Research Agenda for Historical and Multilingual Optical Character Recognition" by David A. Smith and Ryan Cordell, with the support of The Andrew W. Mellon Foundation. The report analyzes current challenges faced by humanities researchers using OCR text and outlines important avenues for research to improve OCR quality. … [Read more...] about Improving OCR using FromThePage
Archives for February 2019
Prosopography Hackathon Project: Using Machine Learning to extract entities from Ancient Greek (and other languages)
I've just returned from a Prosopography Hackathon at the University of Vienna, a three day long digital humanities event to "hack" databases of people and biography. After a short brainstorming session, I volunteered for "information extraction" (getting information out of texts), but my three-person team had dissolved by the afternoon of the first day. I feared I'd have to … [Read more...] about Prosopography Hackathon Project: Using Machine Learning to extract entities from Ancient Greek (and other languages)
Protected: How to Learn to Read Shorthand
This content is password protected. To view it please enter your password below: Password: … [Read more...] about Protected: How to Learn to Read Shorthand
OCR Correction vs Transcription
We found this recent comment by a volunteer on a FromThePage project to be fascinating: "I am sad to report I have found numerous errors, too many to even begin to fix, within these pages... It will be much easier to completely transcribe from the beginning correctly, than try and fix ALL the typos. Would you like me to do this for the Library? " OCR correction is arguably … [Read more...] about OCR Correction vs Transcription