• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar

FromThePage Blog

Crowdsourcing, transcription and indexing for libraries and archives

  • Home
  • Interviews
  • crowdsourcing
  • how-to
  • Back to FromThePage
  • Collections

We Got Clobbered by Bots

July 10, 2025 by Josie Brumfield

A behind-the-scenes look at how an invisible attack nearly brought down FromThePage—and what we did next.
A Four-Hour Outage, Bots, and a Hard Lesson Learned

Last Thursday, June 26th, FromThePage experienced more than a dozen outages totalling four hours. In the days leading up to this, we also saw a noticeable slowdown in performance. This wasn’t just frustrating — it was perplexing. Today, we’d like to walk you through what happened, how we solved it, and what it means going forward.

The Long Game of Playing Whack-a-Bot

We’ve been under pressure from bots for a while. Over the past year, we doubled our server capacity, blocked misbehaving crawlers with middleware, and fine-tuned heuristics to detect scraping patterns. We even wrote a newsletter about AI scrapers and their impact on cultural heritage websites.

Last Monday, we had new defenses queued up: restricting access to the “Versions” tab for anonymous users and deploying a honeypot trap to catch bad actors. But then something changed.

Browser requests started timing out. From our perspective, the server was behaving… oddly. Activity would slow, then disappear completely — as if no one was visiting the site. But it wasn’t a normal traffic dip–traffic dropped to 5% of peak rather than the 75% we see on weekends–and none of our usual diagnostic tools were turning up any red flags. We checked memory, inodes, disk space, file handles — nothing. We even began suspecting our own recent code changes or data imports.

Ben reached out to folks on the Code4Lib #bots channel. Still nothing definitive. Finally, we called an old friend—a fellow Rice IT alum—just to walk him through the weirdness. His immediate suggestion was a wake-up call: “This could be a hardware or network issue. Have you asked your hosting provider to check?”

We hadn’t. So we opened a ticket, made a phone call, and talked to a kind tech at Linode. His answer? “Your server is being hammered by traffic from China.”

Bots. Again. And this time, the attack was so aggressive it didn’t even reach our server — the network interface was overwhelmed before our software ever got a chance to see the traffic.

How We Got Back Online

Once we knew what was really happening — a DDoS-style bot attack, not our own internal issue — the solution became clear. Several people had recommended Cloudflare in the past, and this felt like the time to act.

Sara moved fast, signing up for Cloudflare and redirecting our DNS to their network. Cloudflare now acts as our shield: intercepting all requests, analyzing them for patterns, and filtering out malicious traffic before it reaches us.

What’s Ahead

We’ve been watching our core metric — Pages Transcribed Per Hour (PTPH) — and we’re pleased to say that activity has returned to normal.

That said, migrating DNS records quickly meant a few hiccups. If you missed an overnight email or struggled to upload files that day, you weren’t alone. Large uploads were down for about 24 hours, and some users had trouble connecting until they cleared cookies or flushed DNS caches.

We’re not naïve enough to think we’ve seen the last of this. We’re a small team, and our resources are limited. The bot-makers — whether LLM trainers or something else entirely — have more resources than we do. It’s an arms race, and we’re playing defense.

But we’re not alone. The GLAM community has been incredibly generous with advice and support. The Code4Lib Slack, again, proved to be a vital space for troubleshooting and camaraderie. And new ideas — like Cloudflare’s permission-based scraping scheme — suggest there might be a way to coexist with the forces that want our data, rather than simply block them.

We’re Ready for What’s Next (Mostly)

We don’t know what the next threat will look like. But we know it’s coming. And we know now how to listen better — to our systems, to our peers, to our instincts.

We’re grateful for your patience. We’re doing our best to build a resilient, respectful platform — and to keep it running even under pressure.

If you have questions, or if you’ve been dealing with similar attacks, we’d love to hear from you. The more we share, the stronger we all get.

Filed Under: Uncategorized

Primary Sidebar

What’s Trending on The FromThePage Blog

  • Classifying the Mistakes We Make When We Transcribe
  • ChatGPT for Libraries and Archives
  • Guide to Digitizing Your Archives
  • 10 Ways AI Will Change Archives
  • Detecting Handwriting in OCR Text
  • How to Handle Racial or Ethnic Slurs &…

Recent Client Interviews

An Interview with Candice Cloud of Stephen F. Austin State University

An Interview with Shanna Raines of the Greenville County Library System

An Interview with Jodi Hoover of Digital Maryland

An Interview with Michael Lapides of the New Bedford Whaling Museum

An Interview with NC State University Libraries

Read More

ai artificial intelligence crowdsourcing features fromthepage projects handwriting history iiif indexing Indianapolis Indianapolis Children's Museum interview Jennifer Noffze machine learning metadata newsletter ocr paleography podcast racism Ryan White spreadsheet transcription transcription transcription software

Copyright © 2025 · Magazine Pro on Genesis Framework · WordPress · Log in

Want more content like this?  We publish a newsletter with interesting thought pieces on transcripion and AI for archives once a month.


By signing up, you agree to our Privacy Policy and Terms of Service. We may send you occasional newsletters and promotional emails about our products and services. You can opt-out at any time.  We never sell your information.