This is a response to the recently published "A Research Agenda for Historical and Multilingual Optical Character Recognition" by David A. Smith and Ryan Cordell, with the support of The Andrew W. Mellon Foundation. The report analyzes current challenges faced by humanities researchers using OCR text and outlines important avenues for research to improve OCR quality. … [Read more...] about Improving OCR using FromThePage
ocr
Crowdsourcing the Alabama World War I Service Records
On August 2, 2018, Meredith McDonough of the Alabama Department of Archives and History and Ben Brumfield of Brumfield Labs presented "Crowdsourcing the Alabama World War I Service Records" at the CONTENTdm User Meeting. Our fellow panelists were Phil Sager and Kristen Newby of the Ohio History Connection, who presented on the crowdsourced transcription system they had built … [Read more...] about Crowdsourcing the Alabama World War I Service Records
Detecting Handwriting in OCR Text
This is my fourth and final post about the iDigBio Augmenting OCR Hackathon. Prior posts covered the hackathon itself, my presentation on preliminary results, and my results improving the OCR on entomology specimens. The other participants are slowly adding their results to the hackathon wiki, which I recommend checking back with (their efforts were much more … [Read more...] about Detecting Handwriting in OCR Text
Results of the "Ocrocrop" Approach to Improving OCR
This project attempted to improve the quality of OCR applied to difficult entomology images[*] by cropping labels from the images to run through OCR separately. In order to identify labels on the image to crop, an initial, 'naive' pass of OCR was made over the whole image, generating both A) a set of rectangles on the image defined as word bounding boxes by the OCR engine, … [Read more...] about Results of the "Ocrocrop" Approach to Improving OCR
iDigBio Augmenting OCR Hackathon
I spent the last three days at the iDigBio Augmenting OCR Hackathon working alongside mycologists, botanists, entomologists, herbarium managers, and bioinformaticians to explore ways to improve parsing of digitized specimen labels. While I'm pleased with the results of my own contribution, I'd like to take a minute to talk about the hackathon process itself before I post … [Read more...] about iDigBio Augmenting OCR Hackathon