Bot Management in the AI Era: Scoring Tiers, WebAuthn Step-Up, and Vendor Selection
The Problem
The CAPTCHA era is ending. reCAPTCHA v2 and v3, Cloudflare Turnstile, and hCaptcha were all designed when solving visual puzzles reliably distinguished humans from computers. In 2024–2025, AI vision models solve image CAPTCHAs with >99% accuracy in milliseconds. Audio CAPTCHAs are solved by speech recognition in under one second. Behaviour-based CAPTCHA scores (reCAPTCHA v3, Turnstile) are defeated by agents using real browser engines with LLM-generated humanlike interaction patterns — mouse curves, scroll velocity, and click timing that sit within the statistical bounds of human behaviour.
The problem is architectural, not parametric. CAPTCHA is a binary gate (pass/fail) applied at one point in the request lifecycle. It has two failure modes that are structurally unavoidable:
- False positive: Legitimate users who score below the threshold are blocked. Users with accessibility needs, users in privacy-protective browsers with reduced fingerprint surface, users on proxied corporate networks — all appear suspicious to behaviour-based scoring.
- False negative: Sophisticated bots above the threshold are allowed. In 2026, the false-negative rate for AI-driven bots has reached the point where CAPTCHA provides near-zero security value against any motivated, funded attacker.
Each previous generation of bot defence has followed the same arc: effective on introduction, defeated within 18 months as attackers reverse-engineer the signal, then retained as compliance theatre while actual protection collapses.
- IP blocklists — defeated by residential-proxy farms. Blocklisting a datacenter ASN has no effect when the bot traffic originates from compromised home routers with residential IPs in every city.
- Signature-based WAF rules — defeated by user-agent rotation and request mutation. A WAF rule matching
curl/7.xorpython-requestsis trivially bypassed by settingUser-Agent: Mozilla/5.0. The signal was never strong. - TLS/JA4 fingerprinting — defeated by impersonator libraries (
curl-impersonate,tls-client) that replicate the TLS handshake of Chrome, Firefox, or Safari down to extension ordering and cipher suite selection. A JA4 score matching Chrome 120 on Windows is no longer evidence of a Chrome browser. - JavaScript fingerprinting and behavioural scoring (reCAPTCHA v3, Cloudflare Bot Management, DataDome) — defeated by headless browsers with LLM-driven interaction. Playwright running GPT-4o to generate mouse paths and timing now produces behaviour scores in the legitimate range for several major products. Detection still works probabilistically — sophisticated attacks are expensive — but not categorically.
The correct response is not to search for a better single-signal detector. It is to change the architecture.
The correct architecture:
-
Tier 1 — Allow: Clear signals of legitimate automated traffic — verified crawler bots with rDNS plus UA plus rate matching, WAF-allowlisted partner API consumers with API keys, known search engine IP ranges. These are not scored. They bypass the decision plane via explicit allowlist.
-
Tier 2 — Challenge: Ambiguous signals. Medium bot score. Apply a non-CAPTCHA challenge that requires something an agent cannot do: WebAuthn step-up authentication proves device-bound credential; proof-of-work imposes CPU cost at scale; a silent JS challenge verifies browser implementation details that impersonators cannot replicate without shipping an entire browser engine.
-
Tier 3 — Block: Clear bot signals — datacenter ASN without an API key, known malicious fingerprint, impossible request sequence, score above threshold after a challenge.
-
Explicit allowlist: Legitimate automation (Googlebot, monitoring services, CI/CD pipeline, partner API consumers) must be explicitly allowlisted with verifiable identity. This is managed separately from scoring and is the correct answer to the “false positive” problem that every binary gate creates.
The shift is from “binary gate at entry” to “tiered scoring across the entire session.” A session that starts as ambiguous and then requests a high-value action gets challenged at that action, not at entry. A session that has cleared a WebAuthn assertion is cleared for the duration. A session that fails a challenge degrades, it is not immediately blocked — because the false-positive cost of wrongly blocking a real user at a high-value endpoint is significant.
Threat Model
Primary adversary: AI bot defeating individual detection techniques
An attacker using Playwright with LLM-generated interaction patterns, a residential proxy, and a JA4-impersonating TLS stack defeats TLS fingerprinting, behaviour scoring, and CAPTCHA individually. The same attacker is detectable when signals are combined: residential proxy plus high request velocity plus JA4 mismatch on a minor browser version plus behavioural deviation on a specific micro-interaction. No single signal is sufficient; the combination is.
Secondary adversary: legitimate automation incorrectly blocked
Googlebot, UptimeRobot, your own CI/CD pipeline, and your partner API consumers will all score as bots under any aggressive detection regime — because they are bots. Incorrectly blocking them has direct operational cost: SEO index degradation, missed monitoring alerts, broken integrations, partner SLA breaches. The allowlist is not a convenience; it is a correctness requirement.
Tertiary adversary: vendor lock-in and detection decay
A single bot-detection vendor whose efficacy drops as attackers adapt leaves the organisation with no defence and a procurement cycle measured in months. The architecture must allow vendor swapping at the signal layer without rebuilding the decision logic.
Hardening Configuration
1. Three-Tier Bot Scoring Policy as Code
The policy belongs in version control. Every change requires a code review by the security team. Edge configuration that exists only in vendor consoles is not auditable and cannot be reviewed in incident retrospectives.
# bot-management-policy.yaml
# Changes to this file must be reviewed by security-team before merge.
# Deployed to edge via CI/CD — console edits are rejected by policy.
tiers:
allow:
# Requests matching allow-tier conditions skip scoring entirely.
conditions:
- name: verified_search_crawler
criteria:
rdns_matches_ua: true
ip_in_verified_range: ["googlebot", "bingbot", "duckduckbot", "yandexbot"]
rate_within_robots_txt: true
# Verified via forward/reverse DNS match. See allowlist-manager.py.
- name: allowlisted_api_consumer
criteria:
has_valid_api_key: true
api_key_not_revoked: true
asn_not_residential_proxy: true
rate_within_contract: true
# API keys issued per partner. Tracked in secrets manager with
# expiry, scope, and owning team. Auto-revoked when partner
# offboarded.
- name: internal_monitoring
criteria:
source_ip_in_cidr: ["10.0.100.0/24", "10.0.101.0/24"]
user_agent_prefix_in:
- "synthetics/"
- "uptime-kuma/"
- "datadog-synthetics/"
- "pingdom/"
# Monitoring sources are IP-restricted. UA match is secondary
# signal, not primary; monitoring IPs must be in CIDR range.
- name: owned_ci_cd
criteria:
source_ip_in_cidr: ["10.0.200.0/24"]
has_internal_request_header: "X-Internal-Pipeline: true"
header_hmac_valid: true
# CI runners issue HMAC-signed headers. Validates against
# shared secret rotated monthly.
challenge:
# Default challenge type for ambiguous traffic.
# WebAuthn is preferred over CAPTCHA for all authenticated contexts.
default_type: webauthn_step_up
fallback_type: proof_of_work
conditions:
# Challenge any request with a score in the ambiguous range...
- bot_score_range: [0.30, 0.75]
action_risk_tier: [high, critical]
# ...or with a lower threshold on the most sensitive endpoints.
- bot_score_range: [0.20, 0.75]
endpoint_in: ["/login", "/signup", "/checkout", "/password-reset"]
# A cleared challenge issues a session-scoped clearance token.
# Subsequent requests in the same session skip challenge.
clearance_token:
ttl_minutes: 30
scope: session_id
algorithm: HS256
block:
conditions:
- bot_score_above: 0.85
- ip_in_threat_intel_blocklist: true
- impossible_request_sequence: true
# Examples: POST /checkout without prior GET /cart;
# POST /login at >500 rps from single IP;
# API call sequence inconsistent with any documented client flow.
- challenge_failed: true
# After failing a step-up challenge, escalate to block
# for the remainder of the session.
# Scoring thresholds are reviewed quarterly against false-positive data.
# Current values set 2026-05-08. Next review: 2026-08-01.
threshold_review:
last_updated: "2026-05-08"
next_review: "2026-08-01"
false_positive_target: "<0.1% of human sessions challenged"
false_negative_target: "<5% of identified bot campaigns pass undetected"
The policy file is the source of truth. A CI job runs a diff between the deployed edge configuration and this file every 15 minutes and fires an alert on any divergence. Console edits that bypass the CI pipeline are caught within one monitoring cycle.
2. WebAuthn Step-Up as CAPTCHA Replacement
WebAuthn step-up replaces CAPTCHA for authenticated contexts. The critical difference is what it proves: CAPTCHA asks “can you solve a puzzle that humans are better at than machines” — a bar that AI has now cleared. WebAuthn asks “do you have access to a device-bound private key that was registered by a human during account setup.” No vision model, no LLM, no headless browser can generate a valid WebAuthn assertion without access to the physical authenticator.
// webauthn-step-up.js
// Triggered when the decision plane returns STEP_UP for the current session.
// Assumes the user already has a registered passkey (onboarded during account creation).
const STEP_UP_OPTIONS_ENDPOINT = '/api/auth/stepup/options';
const STEP_UP_VERIFY_ENDPOINT = '/api/auth/stepup/verify';
/**
* Initiates a WebAuthn step-up challenge for a high-risk action.
*
* @param {string} action - The action being protected (e.g. 'checkout', 'password_change').
* @returns {Promise<boolean>} - true if the user passed, false if they failed or have no passkey.
*/
async function stepUpChallenge(action) {
// 1. Request a challenge from the server.
// The server generates a cryptographic nonce tied to the current session.
// It also specifies which credential IDs are acceptable (those registered
// for this account), preventing cross-account replay.
const optionsResponse = await fetch(STEP_UP_OPTIONS_ENDPOINT, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
action,
session_id: getSessionId(), // opaque session token from cookie/storage
}),
});
if (!optionsResponse.ok) {
console.error('Step-up options request failed:', optionsResponse.status);
return fallbackChallenge(action);
}
const options = await optionsResponse.json();
// 2. Decode base64url fields the browser API expects as ArrayBuffers.
options.challenge = base64urlToBuffer(options.challenge);
options.allowCredentials = (options.allowCredentials || []).map(c => ({
...c,
id: base64urlToBuffer(c.id),
}));
try {
// 3. Invoke the browser's credential API.
// This causes the OS to prompt the user for their platform authenticator:
// Touch ID, Face ID, Windows Hello, or a FIDO2 hardware key.
// userVerification: 'required' means the authenticator must perform local
// user verification (biometric or PIN) — presence alone is not sufficient.
//
// AI bots cannot satisfy this call. There is no physical authenticator,
// no TPM, no Secure Enclave. The WebAuthn API is not exposed in server-side
// JS runtimes. A headless browser that intercepts the call and returns a
// fabricated response cannot produce a valid signature over the server-issued
// challenge — the private key never left the secure hardware.
const assertion = await navigator.credentials.get({
publicKey: {
...options,
userVerification: 'required',
timeout: 60000, // 60s for the user to respond
},
});
// 4. Encode the assertion for transport.
const assertionPayload = {
id: assertion.id,
rawId: bufferToBase64url(assertion.rawId),
type: assertion.type,
response: {
authenticatorData: bufferToBase64url(assertion.response.authenticatorData),
clientDataJSON: bufferToBase64url(assertion.response.clientDataJSON),
signature: bufferToBase64url(assertion.response.signature),
userHandle: assertion.response.userHandle
? bufferToBase64url(assertion.response.userHandle)
: null,
},
};
// 5. Send to server for verification.
// Server verifies: challenge matches the one issued for this session,
// signature verifies against the stored public key for this credential,
// rpIdHash matches this origin, authenticatorData flags include UV bit.
const verifyResponse = await fetch(STEP_UP_VERIFY_ENDPOINT, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ action, assertion: assertionPayload }),
});
if (!verifyResponse.ok) {
console.warn('Step-up verification failed:', verifyResponse.status);
return false;
}
const { clearance_token } = await verifyResponse.json();
// 6. Store the clearance token. The decision plane checks this token on
// subsequent requests and skips step-up for the rest of the session.
sessionStorage.setItem('bot_clearance_token', clearance_token);
return true;
} catch (err) {
if (err.name === 'NotAllowedError') {
// User cancelled or timed out. Do not retry automatically.
return false;
}
if (err.name === 'NotSupportedError' || err.name === 'SecurityError') {
// No platform authenticator available, or context not secure.
// Fall back to TOTP or device-bound OTP — not image CAPTCHA.
return fallbackChallenge(action);
}
console.error('WebAuthn step-up error:', err);
return fallbackChallenge(action);
}
}
/**
* Fallback for users without a registered passkey.
* Offers TOTP or a one-time code via registered email/phone.
* Explicitly does NOT fall back to image CAPTCHA — the fallback path
* must remain unsolvable by AI.
*/
async function fallbackChallenge(action) {
// Present TOTP entry UI, then POST the code for server-side validation.
// Implementation depends on your auth stack (TOTP via OATH, SMS OTP, etc.)
return initiateOtpChallenge(action);
}
// Utility: ArrayBuffer to base64url string.
function bufferToBase64url(buffer) {
return btoa(String.fromCharCode(...new Uint8Array(buffer)))
.replace(/\+/g, '-').replace(/\//g, '_').replace(/=/g, '');
}
// Utility: base64url string to ArrayBuffer.
function base64urlToBuffer(b64url) {
const b64 = b64url.replace(/-/g, '+').replace(/_/g, '/');
const bin = atob(b64);
return Uint8Array.from(bin, c => c.charCodeAt(0)).buffer;
}
function getSessionId() {
return document.cookie.match(/session_id=([^;]+)/)?.[1] ?? '';
}
WebAuthn step-up requires users to have registered a passkey at account setup. This is the adoption prerequisite — the step-up challenge is useless for unenrolled users. The migration path is: (1) offer passkey registration at account creation and in account settings, (2) nudge existing users to enrol during login with a non-blocking prompt, (3) begin requiring passkey for step-up challenges only after the enrolment rate for the affected user segment exceeds 80%. During the transition, the fallback is TOTP, not CAPTCHA.
3. Verified Bot Allowlist Management
The allowlist is operationally critical and operationally fragile. It solves the false-positive problem for known legitimate bots, but it accumulates entries over time and each entry is a potential bypass. Management must be automated and audited.
# allowlist_manager.py
# Manages the allowlist for verified crawler bots.
# Run as a scheduled job: daily for range refresh, per-request for rDNS verification.
import asyncio
import ipaddress
import socket
import logging
from dataclasses import dataclass, field
from typing import Optional
import httpx
logger = logging.getLogger(__name__)
# Published IP range manifests for major verified crawlers.
# These URLs are authoritative — do not substitute unofficial mirrors.
CRAWLER_RANGE_URLS: dict[str, str] = {
"googlebot": "https://developers.google.com/static/search/apis/ipranges/googlebot.json",
"google-special-crawlers":
"https://developers.google.com/static/search/apis/ipranges/special-crawlers.json",
"google-user-triggered":
"https://developers.google.com/static/search/apis/ipranges/user-triggered-fetchers.json",
# Bing does not publish a machine-readable manifest; verify via rDNS only.
# DuckDuckBot does not publish ranges; verify via rDNS only.
}
# rDNS suffix that each crawler's IPs must reverse-resolve to.
CRAWLER_RDNS_SUFFIXES: dict[str, list[str]] = {
"googlebot": [".googlebot.com", ".google.com"],
"bingbot": [".search.msn.com"],
"duckduckbot": [".duckduckgo.com"],
"yandexbot": [".yandex.com", ".yandex.net", ".yandex.ru"],
}
# User-agent substrings each verified crawler must present.
CRAWLER_UA_SUBSTRINGS: dict[str, str] = {
"googlebot": "Googlebot",
"bingbot": "bingbot",
"duckduckbot": "DuckDuckBot",
"yandexbot": "YandexBot",
}
@dataclass
class CrawlerRanges:
"""Cached IP ranges for a single verified crawler, refreshed daily."""
name: str
ipv4_networks: list[ipaddress.IPv4Network] = field(default_factory=list)
ipv6_networks: list[ipaddress.IPv6Network] = field(default_factory=list)
_cache: dict[str, CrawlerRanges] = {}
async def refresh_ranges() -> None:
"""
Fetch current IP ranges from each crawler's published manifest.
Call this once per day from a scheduled job.
Failures are logged but do not clear existing cache — stale ranges
are preferable to empty ranges, which would block all crawlers.
"""
async with httpx.AsyncClient(timeout=30) as client:
for name, url in CRAWLER_RANGE_URLS.items():
try:
resp = await client.get(url)
resp.raise_for_status()
data = resp.json()
ranges = CrawlerRanges(name=name)
for prefix in data.get("prefixes", []):
if "ipv4Prefix" in prefix:
ranges.ipv4_networks.append(
ipaddress.IPv4Network(prefix["ipv4Prefix"])
)
if "ipv6Prefix" in prefix:
ranges.ipv6_networks.append(
ipaddress.IPv6Network(prefix["ipv6Prefix"])
)
_cache[name] = ranges
logger.info(
"Refreshed %s ranges: %d IPv4, %d IPv6",
name,
len(ranges.ipv4_networks),
len(ranges.ipv6_networks),
)
except Exception as exc:
logger.error("Failed to refresh ranges for %s: %s", name, exc)
def _ip_in_published_ranges(ip: str, crawler: str) -> bool:
"""Check whether an IP falls within the crawler's published ranges."""
if crawler not in _cache:
return False # Ranges not yet loaded; fail safe by requiring rDNS.
ip_obj: ipaddress.IPv4Address | ipaddress.IPv6Address
try:
ip_obj = ipaddress.ip_address(ip)
except ValueError:
return False
ranges = _cache[crawler]
if isinstance(ip_obj, ipaddress.IPv4Address):
return any(ip_obj in net for net in ranges.ipv4_networks)
return any(ip_obj in net for net in ranges.ipv6_networks)
def _verify_rdns(ip: str, crawler: str) -> bool:
"""
Perform forward-confirmed reverse DNS (FCrDNS) verification.
Algorithm:
1. Reverse-resolve the IP to a hostname.
2. Check that the hostname ends with one of the crawler's known suffixes.
3. Forward-resolve the hostname back to an IP.
4. Confirm the forward-resolved IP matches the original IP.
Step 4 is essential. Without it, an attacker who controls DNS for
a subdomain of googlebot.com (unlikely but the check costs nothing)
could pass step 2. The forward confirmation closes that gap.
"""
suffixes = CRAWLER_RDNS_SUFFIXES.get(crawler, [])
if not suffixes:
return False
try:
hostname, _, _ = socket.gethostbyaddr(ip)
except (socket.herror, socket.gaierror):
return False
if not any(hostname.endswith(suffix) for suffix in suffixes):
return False # rDNS hostname does not belong to expected domain.
try:
forward_ips = {
result[4][0] for result in socket.getaddrinfo(hostname, None)
}
except socket.gaierror:
return False
return ip in forward_ips
def is_verified_crawler(
ip: str,
user_agent: str,
path: str,
robots_txt_rate_compliant: bool = True,
) -> Optional[str]:
"""
Returns the crawler name if the request is a verified legitimate crawler,
or None if it cannot be verified.
Verification requires all three conditions:
- IP in published range OR passes FCrDNS check (belt and suspenders).
- User-Agent contains the expected substring.
- (Optional) Request rate is within robots.txt crawl-delay limits.
A Googlebot user-agent from an IP not in Google's published ranges and
failing FCrDNS is a bot impersonating Googlebot — more suspicious than
an anonymous bot, not less. Treat as Tier 3.
"""
for crawler, ua_substring in CRAWLER_UA_SUBSTRINGS.items():
if ua_substring.lower() not in user_agent.lower():
continue
in_published_range = _ip_in_published_ranges(ip, crawler)
rdns_valid = _verify_rdns(ip, crawler)
if not (in_published_range or rdns_valid):
logger.warning(
"Crawler impersonation attempt: UA=%s, IP=%s, "
"in_range=%s, rdns=%s",
user_agent, ip, in_published_range, rdns_valid,
)
return None # Impersonation — escalate, do not allow.
if not robots_txt_rate_compliant:
logger.info(
"Verified crawler %s exceeding crawl-delay: IP=%s",
crawler, ip,
)
return None # Rate violation — throttle, do not allow unconditionally.
return crawler
return None # Not a known crawler UA.
Key invariant: a crawler that presents Googlebot’s User-Agent but fails both range and FCrDNS checks is treated as a higher-priority threat, not a lower one. Impersonation of verified crawlers is a common technique to bypass bot defences. Log it, score it as Tier 3, and alert on volume.
4. Proof-of-Work Challenge for Unauthenticated Endpoints
For endpoints where the user is not authenticated — and therefore cannot perform a WebAuthn assertion against a registered credential — proof-of-work provides a non-CAPTCHA challenge that is economically hostile to high-volume bot traffic. A human waiting 100ms for a single challenge is unaffected. A bot farm running 10,000 concurrent sessions, each consuming 100ms of CPU at difficulty 4, is paying a meaningful compute cost.
# proof_of_work.py
# SHA-256 proof-of-work challenge for unauthenticated endpoints.
# Difficulty 4 requires ~65,536 iterations on average.
# At difficulty 5: ~1,048,576 iterations (~1–2 seconds on a modern CPU).
# Use difficulty 4 for account creation; difficulty 5 for repeated failures.
import hashlib
import secrets
import time
from typing import Optional
def generate_challenge(
difficulty: int = 4,
ttl_seconds: int = 300,
) -> dict:
"""
Generate a proof-of-work challenge.
Returns a dict the server stores (keyed by nonce) and sends to the client.
The client must find a `solution` string such that:
sha256(nonce + solution).hexdigest().startswith("0" * difficulty)
"""
if not 1 <= difficulty <= 7:
raise ValueError(f"Difficulty must be between 1 and 7, got {difficulty}")
nonce = secrets.token_hex(16)
issued_at = int(time.time())
return {
"nonce": nonce,
"difficulty": difficulty,
"algorithm": "sha256",
"issued_at": issued_at,
"expires_at": issued_at + ttl_seconds,
}
def verify_solution(
nonce: str,
solution: str,
difficulty: int,
issued_at: int,
expires_at: int,
already_used: bool = False,
) -> tuple[bool, str]:
"""
Verify a proof-of-work solution.
Returns (valid: bool, reason: str).
The caller must check `already_used` before calling — PoW solutions
are single-use. Store verified nonces in Redis with TTL == expires_at
to prevent replay attacks.
"""
now = int(time.time())
if now > expires_at:
return False, "challenge_expired"
if already_used:
return False, "challenge_already_used"
# Validate inputs to prevent length-extension or injection.
if not isinstance(nonce, str) or len(nonce) != 32:
return False, "invalid_nonce_format"
if not isinstance(solution, str) or len(solution) > 64:
return False, "invalid_solution_format"
if not solution.isascii():
return False, "non_ascii_solution"
target_prefix = "0" * difficulty
candidate = hashlib.sha256(f"{nonce}{solution}".encode()).hexdigest()
if not candidate.startswith(target_prefix):
return False, "incorrect_solution"
return True, "ok"
def expected_iterations(difficulty: int) -> int:
"""
Return the expected number of hash iterations for a given difficulty.
Each hex character requires matching 1/16 probability.
Expected iterations = 16^difficulty.
"""
return 16 ** difficulty
# Expected compute cost at difficulty=4:
# expected_iterations(4) = 65,536
# On a 2024 laptop at ~100M SHA-256/s: ~0.65ms average
# On a low-end mobile at ~5M SHA-256/s: ~13ms average
# On a server running 1,000 parallel bot sessions: 1,000 * 0.65ms = 650ms
# total CPU per challenge round — still cheap but scales linearly.
#
# At difficulty=5:
# expected_iterations(5) = 1,048,576
# On a laptop: ~10ms — acceptable friction
# On a low-end mobile: ~210ms — borderline; use only after a first failure
# On 1,000 server sessions: ~10s total CPU — meaningful cost
The JavaScript client side is a tight loop:
// pow-client.js
// Runs in a Web Worker to avoid blocking the main thread.
async function solveChallenge({ nonce, difficulty }) {
const target = '0'.repeat(difficulty);
let counter = 0;
const encoder = new TextEncoder();
while (true) {
const candidate = `${nonce}${counter}`;
const hashBuffer = await crypto.subtle.digest(
'SHA-256',
encoder.encode(candidate)
);
const hashHex = Array.from(new Uint8Array(hashBuffer))
.map(b => b.toString(16).padStart(2, '0'))
.join('');
if (hashHex.startsWith(target)) {
return { solution: String(counter), hash: hashHex };
}
counter++;
}
}
// Usage: post message to Web Worker with challenge, receive solution.
// Time the solve; if it completes in <1ms, the client is likely
// using a precomputed table — flag and escalate.
One operational note: proof-of-work penalises low-powered devices. A difficulty-5 challenge taking 200ms on a budget Android phone is acceptable; difficulty-6 (>3 seconds) is not. Calibrate difficulty against the P95 solve time on your lowest-common-denominator user device, not a developer laptop.
5. Vendor Evaluation: 30-Day PoC with Controlled Bot Traffic
Vendor selection on detection rate alone is a mistake. Detection rate is measured against the vendor’s own test suite, which reflects the attacks they have already seen. The correct evaluation criterion is signal diversity — how many independent signal classes does the vendor collect, and how many of them can an attacker defeat simultaneously?
## Bot Detection Vendor Evaluation Matrix
Score each dimension 1–5. Weight by your threat model.
### Signal Diversity (weight: 30%)
| Signal Class | Vendor A | Vendor B | Vendor C |
|---------------------------------------------|----------|----------|----------|
| TLS/JA4+ fingerprinting | | | |
| HTTP/2 stream and HPACK fingerprinting | | | |
| Client-side JavaScript behavioural analysis | | | |
| Network-level: ASN, proxy, datacenter | | | |
| Server-side session coherence analysis | | | |
| Threat intelligence feed coverage | | | |
### AI Bot Resistance (weight: 40%)
| Scenario | Vendor A | Vendor B | Vendor C |
|-------------------------------------------------------------|----------|----------|----------|
| Playwright + LLM-generated mouse/timing | | | |
| curl-impersonate (Chrome JA4 from non-Chrome binary) | | | |
| Real Chrome + residential proxy + low request rate | | | |
| Session coherence: scripted flows matching human navigation | | | |
| False positive rate on legitimate automation (your own) | | | |
### Operational Fit (weight: 30%)
| Criterion | Vendor A | Vendor B | Vendor C |
|----------------------------------------------|----------|----------|----------|
| Custom rule capability (not just threshold) | | | |
| SIEM/log export (CEF, JSON, S3) | | | |
| False-positive appeal SLA | | | |
| Signal update SLA for zero-day bot patterns | | | |
| API for programmatic policy management | | | |
| Contractual data retention and deletion | | | |
Run the PoC as follows:
-
Baseline week (days 1–7): Deploy vendor in observe-only mode. Record score distributions for traffic you know to be legitimate (from your monitoring IPs, your own team’s browsing) and traffic you believe to be bots (from rate-anomaly alerts, your existing blocklist).
-
Controlled attack weeks (days 8–21): Use a dedicated test environment — not production. Run three bot scenarios against the vendor: (a) basic
requests/curl, (b) Playwright headless, © Playwright with LLM-generated interaction patterns via a residential proxy. Record detection rates. Most vendors score near 100% on scenario (a) and 60–80% on scenario ©. Treat scenario © as the relevant benchmark. -
False positive week (days 22–28): Replay your monitoring synthetic traffic through the vendor. Replay your CI/CD pipeline’s API calls. Replay traffic from your corporate office IP ranges. Measure false positive rates. A vendor with 80% detection on scenario © and a 2% false positive rate on your monitoring traffic is worse than a vendor with 70% detection and 0.1% false positive rate.
-
Decision day 30: Score the matrix. Weight signal diversity heavily — it predicts durability as bots evolve. Operational fit is the tie-breaker.
6. Bot Management Metrics Dashboard
# grafana-bot-dashboard.yaml
# Import via Grafana provisioning API or dashboard-as-code tooling.
dashboard:
title: "Bot Management Programme"
refresh: "1m"
time: { from: "now-24h", to: "now" }
panels:
- title: "Bot Score Distribution (24h)"
type: histogram
description: >
Score 0 = clearly human, 1 = clearly bot. A healthy programme shows
a bimodal distribution: mass at 0–0.2 (legitimate traffic) and a
smaller peak at 0.8–1.0 (blocked bots). A large mass at 0.3–0.7
indicates ambiguous traffic that step-up is handling — or threshold
miscalibration.
query: "histogram_quantile(0.99, rate(bot_score_bucket[5m]))"
thresholds:
- { value: 0.30, color: yellow } # Challenge zone starts
- { value: 0.75, color: red } # Block zone starts
- title: "Step-Up Challenge Success Rate"
type: stat
description: >
Fraction of issued WebAuthn/PoW challenges that were successfully
completed by the user. Below 60%: passkey enrolment may be too low;
review fallback path. Above 95%: threshold may be too aggressive
(challenging clearly human traffic).
query: >
sum(rate(step_up_challenge_passed_total[1h]))
/
sum(rate(step_up_challenge_issued_total[1h]))
thresholds:
- { value: 0.60, color: red }
- { value: 0.80, color: yellow }
- { value: 0.90, color: green }
- title: "Allowlist Coverage"
type: gauge
description: >
Fraction of total requests that are explicitly allowlisted (skip scoring).
A healthy value depends on your traffic mix. If >50% of your traffic is
from verified crawlers and partner APIs, a high value is normal.
A sudden increase indicates allowlist bloat or a new high-volume consumer
that should have been scored.
query: >
sum(rate(requests_allowlisted_total[1h]))
/
sum(rate(requests_total[1h]))
- title: "Blocked Requests by Tier"
type: piechart
description: >
Breakdown of blocks by decision tier. A healthy programme blocks mostly
at Tier 3 (clear bot signals). High Tier 2 blocks (challenge-then-block)
indicate either aggressive thresholds or an ongoing attack campaign where
bots are failing step-up challenges.
query: "sum by (tier) (rate(requests_blocked_total[1h]))"
- title: "False Positive Complaints (7d rolling)"
type: timeseries
description: >
Count of user-reported false positives: legitimate users who were blocked
or repeatedly challenged. This is the human cost of bot management.
Any upward trend requires threshold review before the next daily standup.
query: "sum(increase(bot_fp_complaint_total[7d]))"
- title: "Vendor Signal Freshness"
type: table
description: >
Age of the most recent threat-intel update from each vendor.
Stale signals (>4h) indicate vendor-side issues or connectivity problems.
Alert threshold: 6h without update from any vendor.
query: "time() - bot_vendor_last_update_timestamp"
columns: ["vendor", "age_seconds", "status"]
Expected Behaviour After Hardening
After the policy is deployed and the allowlist manager is running: a Googlebot request to /sitemap.xml arrives, passes FCrDNS verification and IP range check, is tagged as verified_search_crawler, and is served without entering the scoring pipeline. An UptimeRobot synthetic from a known monitoring CIDR presents uptime-kuma/1.x and is allowlisted via internal_monitoring. Neither request ever touches the decision plane.
A Playwright bot with LLM-generated interaction patterns arrives at /login. Its JA4 fingerprint matches Chrome, its behaviour score is 0.45 (ambiguous), and it is on a residential proxy IP. The decision plane sees score 0.45 on a Tier 0 endpoint (/login maps to tier: 0), which triggers step_up at the 0.40 threshold. The WebAuthn challenge is issued. The bot has no registered credential and no physical authenticator. The challenge times out. The session is scored as challenge_failed, the score is raised to 1.0, and subsequent requests from the session are blocked.
A real user logging in from a corporate network arrives with a lower-than-average fingerprint entropy (proxied, stripped headers). Their score is 0.38. The step-up challenge fires. They authenticate with Touch ID in under two seconds. A clearance token is issued. For the remainder of the 30-minute session, the decision plane reads the clearance token and skips challenge on all subsequent requests. The user experiences one biometric prompt during the session, not per-request friction.
Verify that the policy file is the source of truth and vendor console drift is zero:
# Confirm edge policy matches repo. Should produce no output.
diff \
<(curl -sf https://edge.internal/policy/dump | jq -S .) \
<(yq -o=json bot-management-policy.yaml | jq -S .)
# Confirm allowlist manager is refreshing Googlebot ranges daily.
redis-cli GET "allowlist:googlebot:last_refresh"
# Expect: timestamp within last 25 hours.
# Confirm decision plane is receiving signals from >= 3 vendors.
curl -sf https://decision-plane.internal/health | jq '.signal_sources | length'
# Expect: >= 3
# Confirm WebAuthn is the configured first-choice challenge type.
curl -sf https://challenge.internal/policy | jq '.default_type'
# Expect: "webauthn_step_up"
Trade-offs
WebAuthn step-up requires passkey enrolment. The challenge is only meaningful for users who registered a passkey. Before enrolment reaches a threshold — 80% is a reasonable target for a consumer product, 95%+ for enterprise — you need a fallback path (TOTP, OTP). The fallback path must itself not be CAPTCHA; use OTP delivered via an already-verified channel (registered email, registered phone). The migration is a multi-quarter project, not a configuration change.
Proof-of-work penalises low-powered devices. A difficulty-5 challenge taking 10ms on a developer laptop takes 200–400ms on a budget Android phone. Calibrate difficulty against the P90 solve time on the lowest-powered device segment in your user base. Difficulty 4 is safe for all modern devices. Difficulty 5 is acceptable for retry scenarios. Difficulty 6+ should never be applied to a first attempt.
Allowlist management has operational overhead. Every entry in the allowlist is a bypass of your entire detection stack. Allowlists grow over time as teams add monitoring services, partners request access, and CI pipelines multiply. Without an audit process, allowlists become the attack surface. Quarterly reviews of allowlist entries — verifying that each entry has a named owner, an active business justification, and a non-expired IP range — are not optional.
Multi-vendor portfolio increases integration complexity. A decision plane consuming signals from three vendors means three integrations to maintain, three contracts to renew, three vendor outages to handle gracefully. The architectural answer is to standardise the signal schema (vendor-agnostic score + reasons + metadata) and design the decision plane to degrade gracefully on missing signals, not to fail open. A missing signal from Vendor B means the decision falls back to Vendors A and C; it does not mean no decision.
Tiered endpoint policy requires ongoing maintenance. New endpoints are added regularly. Without a process to assign tier to new endpoints at the time of PR review, they default to Tier 2 — which is the correct safe default — but a new high-risk endpoint (OAuth callback, API key rotation endpoint) that should be Tier 0 will be under-protected until someone notices. Automate tier assignment review as part of the API change review process.
Failure Modes
Allowlist becoming too broad over time. An allowlist entry added for a partner API during an integration three years ago is still present after the partnership ended. The former partner’s IP range was subsequently reassigned to a residential proxy provider. The allowlist now allows residential-proxy traffic to bypass scoring entirely. Detection: run monthly diffs between the allowlist and current IP range ownership data (via whois or an IP intelligence API). Alert on any allowlisted range whose ownership has changed.
WebAuthn fallback reintroduces CAPTCHA. The step-up flow’s fallback for unenrolled users is supposed to be TOTP. A developer unfamiliar with the requirement adds a CAPTCHA widget to the fallback path as a “quick fix” during an incident. AI bots immediately begin exploiting the fallback path, bypassing WebAuthn entirely. Detection: automated testing of the challenge flow, including the fallback path, as part of the CI/CD pipeline. The test asserts that the fallback challenge type is not image_captcha or audio_captcha.
Bot scoring threshold miscalibration after vendor model update. The detection vendor releases a model update that shifts score distributions. Traffic that previously scored 0.25 (allow) now scores 0.45 (challenge), causing a spike in step-up challenges for legitimate users. False-positive complaints rise 400% overnight. Detection: monitor the step_up_challenge_issued_total metric for sudden rate increases. Alert threshold: >2x baseline in a 30-minute window. Resolution: lower the challenge threshold temporarily, open a vendor support ticket to characterise the distribution shift, and re-calibrate thresholds against the new score distribution.
Proof-of-work replay attack. A PoW solution is valid for 300 seconds but the application fails to mark it as used in Redis after verification. An attacker records a valid (nonce, solution) pair from a real browser and replays it 10,000 times within the TTL window. Detection: the already_used check in verify_solution requires that the nonce be recorded in Redis on first use with a TTL matching expires_at. If this Redis write is omitted (e.g., due to a silent Redis connection failure), replay is possible. Instrument Redis write failures as a security alert, not just an operational one.
Vendor signal goes stale during an attack campaign. The primary bot-detection vendor experiences an outage during a large credential-stuffing campaign. Their signal returns score 0.0 for all requests. The decision plane, configured to take the maximum score across vendors, falls back to Vendor B’s signals only. Vendor B’s signal coverage for this attack class is weaker, and the campaign partially succeeds. Detection: the bot_vendor_last_update_timestamp metric triggers an alert when any vendor’s signal is stale for >6 hours. The decision plane’s fallback behaviour (not max, but highest-available) is documented and tested in synthetic scenarios quarterly.