Tracing the Genealogies of Ideas with LLM Embeddings (December 12, 2024)
Identifying intellectual influences in unstructured text is a crucial challenge across many academic disciplines, including intellectual history, social science, and bibliometrics. Researchers in computational social science and digital humanities have explored various approaches to this problem, using techniques like dictionaries, word embeddings, and language models.
At this summer's Digital Humanities conference, Ben was impressed when Lucian Li introduced a new method that leverages sentence embeddings to efficiently search large historical text corpora for similar ideas. This approach remains effective even when the source texts contain high levels of optical character recognition (OCR) errors, which can disrupt previous techniques. Importantly, Li's method is also able to capture indirect influences and paraphrased ideas.
Li evaluated this sentence embedding-based approach on a corpus of 250,000 19th century nonfiction works, and found the detected influences to be well-aligned with existing scholarship in the history of science. By expanding the scope of influence detection beyond just canonical texts and prominent figures, this type of method can provide a more nuanced understanding of how ideas spread, including among historically marginalized groups.
The Speaker:
Lucian Li is a Doctoral Candidate at the School of Information Science, University of Illinois Urbana Champaign. His research falls between digital humanities and computational social science. He leverages the unique affordances of large language models and natural language processing to analyze large scale collections of historical and cultural documents. He is particularly interested in discovering instances and patterns of intellectual influence from unstructured text corpora.
The webinar is on December 12, 2024 at 12:00 PM EST, 11:00 AM CST, and 9:00 AM PST. Signing up will send you an invitation with the details and a follow up with the recording.
Your First Crowdsourcing Project -- January 9, 2025
Join Ben and Sara Brumfield of FromThePage as they step you through your first crowdsourcing project. The session covers selecting material, finding volunteers, developing transcription conventions, keeping volunteers engaged, and what to do with your transcriptions once you're done.
The webinar is on January 9, 2024 at 12:00 PM EST, 11:00 AM CST, and 9:00 AM PST. Signing up will send you an invitation with the details and a follow up with the recording.