In 2019, Anne McLaughlin from the Parker Library gave a talk at the International Image Interoperability Framework Conference about crowdsourcing using FromThePage. You can view her presentation and read the full transcript below.
I'm Anne McLaughlin. I'm from the Parker Library. I'm not so much talking about all of Cambridge's crowdsourcing initiatives, but really just one that we did. I was not trained as a computer scientist or in any sort of AV. Instead, I was trained as a historian and as a librarian. Let me start by telling you a story.
Our story starts a year ago at the Library of Congress, at an IIIF conference very like this one. I was there to present Parker On the Web, or Parker II, as we call it, our new iteration. Finally removing the paywall, opening that up into the world, and exposing our IIIF through an embedded Mirador viewer. The question was left in all of our minds, what's next?
I happened to be in a session with a colleague from Stanford. He was talking about a program to transcribe some of their archives, letters written by Stanford students, either home or to their friends. It got me thinking. If people like transcribing this, what if they liked transcribing this, as well?
Why would we want to do this? Lots of our manuscripts are famous. Lots already have critical additions. But, indeed, there's more to it than that. In opening up our collections, we can support teaching and learning. Paleography is a skill. Learning how to read these is essential for historians, English students, people in medieval and modern languages.
There's a component of access by promoting these collections and showing everybody that they are able to engage with them, no matter their level of training, and support public engagement. We happened to coincide the launch with the opening of the British Library's Anglo-Saxon Kingdoms exhibition. We wanted to highlight the 11 loans that we sent to them. It was also complementary to the relaunch of Parker 2.0. We hoped to promote the platform.
Finally, because while OCR is definitely getting better, it's still got a ways to go when it comes to medieval manuscripts, especially if the layout's not exactly standard. These are two of our manuscripts. The one furthest from me has musical notation in it. The one closest has interlinear glosses. If you put that closest one, a copy of Prudentius from the 11th century, into Google's OCR generator, that is what you get. Not so helpful.
Finally, with the dream. What if could search within a manuscript, just like you search within a PDF? How would that change the way that students interact with this material? Would it make it easier? Would it mean that students are engaging more with the primary sources than they are with the secondary?
We teamed up with FromThePage. As they say, they're simply the finest crowdsourcing manuscript transcription software on the planet, and we're happy to support that. From our perspective, it's been a very good collaboration.
We decided to run a bit of an experiment. We would crowdsource in three ways. First, open access, a traditional crowdsourcing platform, putting it out for the world, try to transcribe or translate. Second, library-based. This would take the form of transcribathons that we would host in the Parker Library itself, literally opening our doors to students, scholars, or members of the general public. Finally, a classroom-based transcription protocol, but I'm not going to be talking about that one today.
To start with our open access stream, you've got to start with the process. You need a dataset. In this case, because we were teamed up with the British Library's Anglo-Saxon exhibition, that was an easy choice. We chose these five and these six. You need to create a transcription protocol. This was ours. Believe it or not, that was done by a high-school senior who came in for a work experience day. We needed something that would be easily legible.
You need to promote it. We all love Twitter. A bit more Twitter. Then you need to hope that people will do it. Within the first three months, it looked like this. 204 pages transcribed by 25 collaborators. That really doesn't seem like much, but when you think about what transcribing a single one of those pages involves, we were actually quite pleased.
To go to the library base, then, the steps are similar. From the dataset we had online, we needed to choose which ones we would go for. We went for this one. It's gorgeous. A ninth-century copy of Bede's Life of Saint Cuthbert, in both prose and verse, because why write something one time when you could write it again? We use the same transcription protocols. We use Twitter once again to promote it. We also made good old-fashioned flyers that we hung up in the university library and scattered around Cambridge.
In a single day, we had 37 collaborators, and 68% of that manuscript got transcribed. That's a huge gain on anything else that we had done previously. Here is that day in the Parker Library. It was phenomenal. We thought no one was going to come. We promoted it about a week ahead of time because we lost track of which week it was. We were full. Almost every chair, every laptop that we had gotten was full. We only have 16 chairs in the Parker. 37 collaborators, and that's only the ones that signed up, is pretty good.
Here's what they made. That's one page and its transcription. What I think was even more powerful about what we were doing was we were bringing together different groups of people. People who are senior lecturers from the Department of Anglo-Saxon, Norse, and Celtic, and first-year English students, classicists, and medievalists, people who maybe wouldn't interact together in a normal Cambridge environment. It became a way to teach and to learn.
One of the functions that FromThePage has is a function about variations. In this one, user "KCMC" is a first-year undergraduate in the English department. She took Latin in high school, but she has never worked with a medieval manuscript in any way, shape, or form before.
You can see where someone's gone in and deleted certain things and added certain things. That someone is Ros Love. She's the senior professor of paleography at the University of Cambridge. She would never meet a first-year English student, but in this environment, for the first time, they are communicating. That feedback is getting fed to the student via the FromThePage platform, and also Ros, for some reason, finds this comforting. To the extent that, for the next month, I woke up to corrections from Ros where she had been doing this in the evening as a way to de-stress. Some people do mindful coloring. Other people correct medieval Latin. What can I say?
Overall, the project looks like this now: 72 collaborators, 409 pages transcribed. That doesn't sound like a lot, just like our first stats didn't sound like a lot. But, I had a look at how long that has taken. That's about 600 hours of contributor time, time that they have donated to this project. That's fairly substantial. That's also 600 hours in which people have engaged with Parker materials, 600 hours where the doors to our library have been open.
Where do we go from here? What's next? The question is the same one. We're continuing to run. Scribal abbreviations, things that some of our users couldn't expand will be expanded. We'd like to pull these transcriptions into our manifest as annotation so that people can look at something and say, "I helped with that." Finally, implement that search API so that we can find, full text search a manuscript just like we would a PDF.