• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Interviews
  • crowdsourcing
  • how-to
  • Back to FromThePage
  • Collections

Can a Scholarly Edition be Produced without a Human?

March 8, 2022 by FromThePage

Recently on Twitter, FromThePage’s Ben Brumfield discussed the potential connection between scholarly editions and artificial intelligence/machine learning. What would happen if scholarly editions, a type of text historically human-dependent with commentary and annotations, were produced without human intervention? Can this be a positive use of technology, or is something objectively lost? Ben started the conversation describing how Artificial Intelligence and Machine Learning transcription technologies are not yet refined, but can already be used to produce rudimentary indexes to books without any human intervention.

Read the full conversation on Twitter.

More than just mechanical processes and technologies, scholarly editions are complex publications that include scholarly contributions, like annotations and commentary. In the Twitter conversation, Mike Cosgrave replied that a “scholarly edition implies more than just indexing and automatic named entity recognition” and isn’t just the mechanics of indexing entities, but something more.

In addition to a loss of unique human value, there’s a unique identity and value to a scholarly edition, in that being an “edition,” it requires human editing, a task where AI technology is limited.  Hugh Cayless replied to Ben’s thread with a focus on this genre of “scholarly edition,” asking, “Can it be an edition if it hasn’t been edited? I think it’s possible to get to the point of good digital transcriptions, but absent strong AI (or altering the definition of ‘edition’), I don’t see how you get to a real edition.” AI technology is limited in the ability to recognize things that are new. AI and machine-learning processes are trained using sets of information, and as such have trouble recognizing anything that they haven’t seen before in training. Hugh pointed out that humans have the unique ability to recognize new-to-them things, while algorithms don’t know how to handle anything they haven’t seen before, which can produce many outliers. AI projects can also replicate the bias of their training data; most infamously in  a criminal bail algorithms which suggested higher bail by race of defendant. With scholarly editions, this can cause problems and information loss:


“[W]hat would be lost is anything that’s an outlier. Humans can recognize stuff they’ve never seen before. Algorithms do weird random shit with stuff they’ve never seen before. Or ignore it entirely. The results, as we well know, can be oppressive.”

Hugh added that outliers could even be as simple as inability to recognize people who don’t capitalize their names, confusion in areas where a person’s name is not distinct from a place name, multilingual texts, and dialects that aren’t “standard.” This is one of the key problems with AI technologies: The way machines are trained produces a (so-called) “standard,” and as Hugh points out, anything that deviates from that standard, that is not a “norm,” may not be processed the right way, and such may be handled incorrectly or eliminated from the product altogether. 

In addition to the potential loss in the product of the algorithm that Hugh pointed out, the process of a human completing the work of transcription is an important learning interaction, which is an vital part of community and public education, leading to important research questions that inform scholarly editions. Ben shared that “Transcription does produce transcripts, but it also makes a real impact on the person doing the work. We learn about the text, the language, the writer, and the subject as we transcribe. Interacting with the handwriting or the paper tells us the education level of the author (or how many drinks of whisky Bill Cody had had when he wrote a particular letter).” This is particularly true for public crowdsourcing projects where the “non-transcript by-product can be an important part of public education (as with the many projects grappling with institutional entanglement with slavery), and for scholarly projects the process can raise new research questions.”

Plus, the process of transcribing—in addition to education—is enjoyable and fun for humans to participate in. In this way, optimization of technology to support AI scholarly editions takes away the opportunity that volunteer transcribers have to participate in something enjoyable and impactful. Ben added, “There's another reason I regard the machine-created edition with some dread -- I really enjoy transcribing, as do our volunteers.”

What do you think? What's the proper role for AI in scholarly editing?

Have documents that could benefit from transcription? Reserve a call with Ben and Sara.

Filed Under: Uncategorized Tagged With: artificial intelligence, crowdsourcing, machine learning, newsletter

Primary Sidebar

What’s Trending on The FromThePage Blog

  • 10 Ways AI Will Change Archives
  • How Do I Read Old Handwriting?
  • Guide to Digitizing Your Archives
  • What to Do When Your Transcribers Can’t Read Cursive
  • FromThePage vs Zooniverse
  • Can the Crowd Create Metadata?

Recent Client Interviews

An Interview with Candice Cloud of Stephen F. Austin State University

An Interview with Shanna Raines of the Greenville County Library System

An Interview with Jodi Hoover of Digital Maryland

An Interview with Michael Lapides of the New Bedford Whaling Museum

An Interview with NC State University Libraries

Read More

ai artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata newsletter ocr paleography podcast racism Ryan White spreadsheet transcription transcription transcription software

Copyright © 2025 · Magazine Pro on Genesis Framework · WordPress · Log in

Want more content like this?  We publish a newsletter with interesting thought pieces on transcripion and AI for archives once a month.


By signing up, you agree to our Privacy Policy and Terms of Service. We may send you occasional newsletters and promotional emails about our products and services. You can opt-out at any time.  We never sell your information.