March 2025
We were talking recently to Paige Roberts, the lone archivist at the Phillips Academy, and she said something interesting:
"I just acquired a new collection. I'm kind of weird, I don't do processing, I just digitize it and throw it up on FromThePage, and boom, people transcribe it. If it's from 1790 you don't really know what you have until you transcribe it."
Whoa.
But it did remind me of the last case we always use in our "10 Ways AI will Change Archives" talk:

This isn't absolutely new. The Library of Virginia has been making decisions about what to digitize based on what their volunteers would be interested in transcribing for years. They call it "feeding the beast" because the "pull" of their volunteers for more material to transcribe -- and material many of them were interested in as genealogists -- drove their decisions.
I was also talking to Dominique Luster, an archives consultant specializing in Black stories in collections. She described a project with less than 1,000 individually rehoused manuscript items that she thought would make sense to digitize, then transcribe (or HTR), and only then describe the materials using those transcriptions.
If large language models might be able to create summaries, or apply subject headings (like our work on the NEH Subject Spotter grant), based on the text of a document does it even make sense for a person to do that work? Is it ethical to slow down how long it takes to make material available to researchers in order to have a human describe it?
I know this is controversial. But I also think it's the future.
Will it be perfect? No.
Will it need human review and occasional intervention? Yes.
But will it vastly increase the amount of your backlog you can get through and make available to researchers. Absolutely.
Leave a Reply
You must be logged in to post a comment.