I wanted to write this month about ChatGPT, and how archives are about as anti-ChatGPT as you can get. Archives can provide learning experiences that ChatGPT can’t fake.
First, a simplification that’s useful in thinking about what ChatGPT (and its ilk) can – and can’t – do. ChatGPT is, in technical terms, a “large language model”. That means that the creators fed it massive amounts of text: books, people arguing on the internet, your favorite travel writer, all of the code repository Github. That large language model can generate a sequence of words – “language” – that, based on statistics, has a pretty good chance of being language that makes sense. Sense-making is what humans have always done, and I think it’s an exaggeration in this case, because it can’t really make sense of the world – it can just answer questions (or prompts that you construct with the specificity of a wizard casting a spell). A friend refers to it as a really advanced autocomplete.
That's not to say it isn't impressive! My teenager dropped one of last year’s AP English prompts in, and ChatGPT produced a very passable essay on the significance of the green light in the Great Gatsby. We’ve also been blown away when Deb Paul turned OCR of a specimen card into DarwinCore structured data. ChatGPT is good at structure.
Because it’s good at structure, and basically making very sophisticated predictions based on a combination of a huge corpus of training text and statistics, it’s boring. It reinforces the status quo of its training data. Ben, to the amusement of our family, asked ChatGPT multiple times and ways to write a story about an astronaut and a dinosaur. Every single time the astronaut was named Tom. The stories were almost caricatures of 1950s sci-fi, with a predictable and boring storyline (singular). ChatGPT is not replacing creative writing anytime soon.
Anything ChatGPT tells you is based on the known. Not just the known, but the English language material that’s been recorded, digitized, analyzed and fed in as training data. It’s derivative – it has to be. The antidote is the unknown – the novel, the unexpected, the counter-intuitive and the surprising. Archives are full of material that hasn’t been read in 20, 200, 2000 years. ChatGPT doesn’t know about it, so any interpretive or analytical work done with archival material will have to contain a decent amount of original work and thought.
Here’s a handful of ideas that – for now, at least – are ChatGPT-proof.
- Transcribe. Puzzling out handwriting might not be a core skill for students, but it forces a deep reading of a text. Asking for observations on the process & material makes them reflect on the task – something we’ve learned large language models are particularly bad at.
- Put material in historical context. I think it still requires humans to make connections between the spoken: “No butter at the store” and the unspoken: “rationing”. Asking for examples that speak to the historical context of the documents, forcing students to read deeply and look for clues. Historical detective work is fun, too.
- Compare the use and choice of language in the document. How does it compare to the student’s language use and choice? How is it similar in tone, if not content?
- Explore the materiality. What was something written on? Why would that be the material? How did the writer use that material? Did they reuse material (from a palimpsest to a daybook from a previous year)? What does that say about the technology and supply at the time? Did they write different material back-to-front than front-to-back? It’s hard for any digital format to capture all of these nuances.
- Look for commonalities. What events, emotions, or expenses in the documents remind students of their own experiences? What resonates? What is so far outside of the student’s experience so as to be foreign?