Blocking Bots

From Code4Lib
Revision as of 12:08, 26 March 2025 by Escowles (Talk | contribs) (Copying Trey's post from Slack to a more permanent place)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Bots got you down? Us too!

Over the last year or longer various institutions have found themselves bombarded by bots who are dodging our normal practices for mitigation. They don't have consistent user agents (and are faking real ones), they're geographically distributed, they have a huge variety of IPs (sometimes one IP per request), and they hit you as hard as they can before you crash, completely ignoring robots.txt.

Below are a few resources this channel has found that are effective. This list will certainly not be exhaustive, but these are battle tested options with real stories of success.

Resources

Cloudflare Turnstile

Cloudflare Turnstile is a free Captcha you can enable on your site for all users (maybe just at certain paths), which normally immediately forwards them on or, at worst, requires the user to check a box.

Implementation Options:

F5 Web Application Firewall

https://www.f5.com/products/big-ip-services/advanced-waf

Imperva Web Application Firewall

@EmersonV has reported this has worked for them. https://www.imperva.com/products/web-application-firewall-waf/

FAQ

Can't I use fail2ban?

There's been some success using fail2ban, especially to block requests from bots hitting very deep facets in Blacklight applications. Unfortunately, since then, the culprits have increased their IP diversity further - when it's one IP per request, that first IP gets in, and your service still goes down.

Can I block some user agents?

Some people have had success blocking user agents pretending to be REALLY OLD versions of browsers. These bots do not have a consistent user agent - you'll have to check the major versions of the browsers reported and filter them out.

I have something new!

Great! Add it here, or post in the #bots channel on Slack to share it.