Blocking Bots
Contents
[hide]Bots got you down? Us too!
Over the last year or longer various institutions have found themselves bombarded by bots who are dodging our normal practices for mitigation. They don't have consistent user agents (and are faking real ones), they're geographically distributed, they have a huge variety of IPs (sometimes one IP per request), and they hit you as hard as they can before you crash, completely ignoring robots.txt.
Below are a few resources this channel has found that are effective. This list will certainly not be exhaustive, but these are battle tested options with real stories of success.
Resources
Cloudflare Turnstile
Cloudflare Turnstile is a free Captcha you can enable on your site for all users (maybe just at certain paths), which normally immediately forwards them on or, at worst, requires the user to check a box.
Implementation Options:
- https://github.com/samvera-labs/bot_challenge_page Rails Engine (thanks @jrochkind). Details in a blog post here: https://bibwild.wordpress.com/2025/01/16/using-cloudflare-turnstile-to-protect-certain-pages-on-a-rails-app/
- https://github.com/libops/captcha-protect Traefik Load Balancer plugin (thanks @Joe Corall). Example implementation using Ansible + Docker, with some production configs: https://github.com/pulibrary/princeton_ansible/tree/main/nomad/traefik-wall
F5 Web Application Firewall
https://www.f5.com/products/big-ip-services/advanced-waf
Imperva Web Application Firewall
@EmersonV has reported this has worked for them. https://www.imperva.com/products/web-application-firewall-waf/
FAQ
Can't I use fail2ban?
There's been some success using fail2ban, especially to block requests from bots hitting very deep facets in Blacklight applications. Unfortunately, since then, the culprits have increased their IP diversity further - when it's one IP per request, that first IP gets in, and your service still goes down.
Can I block some user agents?
Some people have had success blocking user agents pretending to be REALLY OLD versions of browsers. These bots do not have a consistent user agent - you'll have to check the major versions of the browsers reported and filter them out.
I have something new!
Great! Add it here, or post in the #bots channel on Slack to share it.