This is my fourth and final post about the iDigBio Augmenting OCR Hackathon. Prior posts covered the hackathon itself, my presentation on preliminary results, and my results improving the OCR on entomology specimens. The other participants are slowly adding their results to the hackathon wiki, which I recommend checking back with (their efforts were much more … [Read more...] about Detecting Handwriting in OCR Text
Results of the "Ocrocrop" Approach to Improving OCR
This project attempted to improve the quality of OCR applied to difficult entomology images[*] by cropping labels from the images to run through OCR separately. In order to identify labels on the image to crop, an initial, 'naive' pass of OCR was made over the whole image, generating both A) a set of rectangles on the image defined as word bounding boxes by the OCR engine, … [Read more...] about Results of the "Ocrocrop" Approach to Improving OCR
iDigBio Augmenting OCR Hackathon
I spent the last three days at the iDigBio Augmenting OCR Hackathon working alongside mycologists, botanists, entomologists, herbarium managers, and bioinformaticians to explore ways to improve parsing of digitized specimen labels. While I'm pleased with the results of my own contribution, I'd like to take a minute to talk about the hackathon process itself before I post … [Read more...] about iDigBio Augmenting OCR Hackathon
Improving OCR Inputs from OCR Outputs?
This is a transcript of my talk at the iDigBio Augmenting OCR Hackathon, presenting preliminary results of my efforts before the event. For my preliminary work, I tried to improve the inputs to our OCR process through looking at the outputs of a naive OCR. One of the first things that we can do to improve the quality of our inputs to OCR is to not feed them handwriting. To … [Read more...] about Improving OCR Inputs from OCR Outputs?
What does it mean to "support TEI" for manuscript transcription?
This is a transcript of my talk at the 2012 TEI meeting at Texas A&M University, "What does it mean to 'support TEI' for manuscript transcription: a tool-maker's perspective." You can download an MP3 recording of the talk here. Let's get started with a couple of definitions. All the tools and the sites that I'm reviewing are cloud based, which means that I'm ruling … [Read more...] about What does it mean to "support TEI" for manuscript transcription?