Detecting Abuse of LLM API Keys and Inference Endpoints
Problem
LLM API credentials are a distinct credential class that most organisations’ secret management and abuse detection programmes were not designed for. Traditional API key abuse manifests as unauthorised access to data or computation. LLM API key abuse manifests as all of that, plus several patterns unique to language model inference:
Cost-generating inference. A stolen LLM API key can generate thousands of dollars in API costs per hour. Unlike a compromised database credential — which exposes existing data — a compromised LLM key actively generates cost. An attacker who steals an Anthropic, OpenAI, or Gemini API key can sell inference-as-a-service, use it for their own applications, or run compute-intensive batch jobs. The cost appears on the victim’s invoice, not immediately in any security alert.
Data exfiltration via prompt content. Applications that pass sensitive data into LLM prompts (customer PII, internal documents, database query results, user communications) create a vector where a compromised API proxy or a man-in-the-middle on the API call path can capture prompt content. Unlike HTTPS traffic interception, LLM proxy logs routinely contain the full prompt text. A misconfigured LiteLLM proxy, a compromised logging pipeline, or a malicious middleware layer can silently capture all prompt content.
Competitive intelligence extraction. An attacker with access to an organisation’s LLM API key and prompt templates can reconstruct the organisation’s internal workflows, proprietary prompts, system instructions, and business logic. These are encoded in system prompts that the application sends with every API call.
Prompt scanning for injection payloads. Attackers who target AI applications probe them by sending crafted API requests — injection attempts, jailbreak payloads, boundary tests — to discover exploitable patterns. A burst of unusual prompt content from an unexpected source IP is a reconnaissance indicator.
The monitoring gap is that most organisations treat LLM API keys the same way they treat any other API key: rotate them periodically, store them in a secrets manager, and alert if they appear in source code. None of these controls address the real-time abuse detection problem: how do you know within minutes (not months) that your LLM API key is being misused?
Existing LLM API usage dashboards (Anthropic Console, OpenAI Usage) provide aggregate spend and call counts, but they lack: per-key breakdown by source IP, prompt content anomaly detection, real-time alerting at thresholds below the billing cycle, or integration with your SIEM.
Target systems: any organisation with Anthropic, OpenAI, Gemini, or Cohere API keys; organisations running LiteLLM or similar LLM proxies; any application that passes sensitive data in LLM prompts; ML engineering teams responsible for model serving and API key management.
Threat Model
Adversary 1 — Stolen key sold for inference resale. A developer commits an Anthropic API key to a public GitHub repository. A bot discovers it within seconds. The key is sold on underground forums. Multiple actors begin using it simultaneously for their own applications. Usage spikes 100× in an hour; the organisation receives a $10,000+ invoice.
Adversary 2 — Prompt content capture via misconfigured proxy. An organisation routes all LLM API calls through LiteLLM. LiteLLM’s request logging is enabled and writes to a shared logging infrastructure with insufficient access controls. An attacker with read access to the logging pipeline reads all prompts, including those containing customer PII, internal documents, and proprietary business logic.
Adversary 3 — System prompt reconstruction. An attacker who discovers the organisation’s LLM API base URL sends API calls mimicking a legitimate client. They vary the prompt content while keeping the system prompt constant (the application sends the same system prompt with every call). By analysing responses, they reconstruct the system prompt content and the application’s intended behaviour.
Adversary 4 — Credential stuffing for LLM access. An attacker obtains a list of leaked API keys from previous breaches and tests them against LLM provider endpoints. Valid keys are used for inference until they are revoked. Most organisations don’t notice until billing.
Configuration / Implementation
Step 1 — Deploy a proxy that captures usage metadata for monitoring
Route all LLM API calls through a proxy that logs usage metadata (without capturing prompt content by default):
# llm_proxy_monitor.py
# Minimal monitoring wrapper for Anthropic API calls
# Logs metadata WITHOUT storing prompt content
import anthropic
import hashlib
import time
from dataclasses import dataclass
from typing import Optional
import logging
logger = logging.getLogger("llm_monitor")
@dataclass
class CallMetadata:
timestamp: float
model: str
input_tokens: int
output_tokens: int
cost_usd: float
source_service: str
source_ip: Optional[str]
prompt_hash: str # SHA-256 of prompt — for anomaly detection without storing content
prompt_length: int
has_system_prompt: bool
# Deliberately NOT: prompt_content, response_content
# Approximate costs (update when pricing changes)
COST_PER_TOKEN = {
"claude-sonnet-4-6": {"input": 0.000003, "output": 0.000015},
"claude-haiku-4-5-20251001": {"input": 0.00000025, "output": 0.00000125},
}
class MonitoredAnthropicClient:
"""Anthropic client wrapper that logs call metadata for abuse detection."""
def __init__(self, api_key: str, service_name: str):
self._client = anthropic.Anthropic(api_key=api_key)
self.service_name = service_name
self._call_log: list[CallMetadata] = []
def messages_create(self, **kwargs) -> anthropic.types.Message:
start = time.time()
# Hash prompt for anomaly detection (not storage)
messages_str = str(kwargs.get("messages", []))
system_str = str(kwargs.get("system", ""))
prompt_hash = hashlib.sha256(f"{system_str}{messages_str}".encode()).hexdigest()
has_system = bool(kwargs.get("system"))
prompt_len = len(messages_str) + len(system_str)
response = self._client.messages.create(**kwargs)
model = kwargs.get("model", "unknown")
costs = COST_PER_TOKEN.get(model, {"input": 0, "output": 0})
cost = (response.usage.input_tokens * costs["input"] +
response.usage.output_tokens * costs["output"])
metadata = CallMetadata(
timestamp=start,
model=model,
input_tokens=response.usage.input_tokens,
output_tokens=response.usage.output_tokens,
cost_usd=cost,
source_service=self.service_name,
source_ip=None, # Set by calling context if available
prompt_hash=prompt_hash,
prompt_length=prompt_len,
has_system_prompt=has_system,
)
self._call_log.append(metadata)
self._emit_metrics(metadata)
return response
def _emit_metrics(self, m: CallMetadata) -> None:
"""Emit structured log for SIEM ingestion — no prompt content."""
logger.info({
"event": "llm_api_call",
"timestamp": m.timestamp,
"service": m.source_service,
"model": m.model,
"input_tokens": m.input_tokens,
"output_tokens": m.output_tokens,
"cost_usd": m.cost_usd,
"prompt_hash": m.prompt_hash,
"prompt_length": m.prompt_length,
"has_system_prompt": m.has_system_prompt,
})
Step 2 — Implement cost spike alerting
# cost_monitor.py
# Real-time cost alerting — fires before the monthly bill
from collections import defaultdict
from datetime import datetime, timedelta
class LLMCostMonitor:
"""Monitor LLM API costs and alert on anomalies."""
def __init__(
self,
hourly_alert_threshold_usd: float = 10.0,
daily_alert_threshold_usd: float = 50.0,
spike_multiplier: float = 5.0, # Alert if current hour > 5× baseline
):
self.hourly_threshold = hourly_alert_threshold_usd
self.daily_threshold = daily_alert_threshold_usd
self.spike_multiplier = spike_multiplier
self._hourly_costs: list[tuple[datetime, float]] = []
def record_call(self, cost_usd: float, timestamp: datetime) -> list[str]:
"""Record a call and return any triggered alerts."""
self._hourly_costs.append((timestamp, cost_usd))
# Clean old records
cutoff = timestamp - timedelta(days=7)
self._hourly_costs = [(t, c) for t, c in self._hourly_costs if t > cutoff]
return self._check_alerts(timestamp)
def _check_alerts(self, now: datetime) -> list[str]:
alerts = []
# Current hour cost
hour_ago = now - timedelta(hours=1)
hour_cost = sum(c for t, c in self._hourly_costs if t > hour_ago)
if hour_cost > self.hourly_threshold:
alerts.append(
f"LLM cost spike: ${hour_cost:.2f} in last hour "
f"(threshold: ${self.hourly_threshold})"
)
# Check for spike vs baseline (last 7 days same hour)
baseline_hours = []
for day_offset in range(1, 8):
window_start = now - timedelta(days=day_offset, hours=1)
window_end = now - timedelta(days=day_offset)
window_cost = sum(
c for t, c in self._hourly_costs
if window_start < t < window_end
)
if window_cost > 0:
baseline_hours.append(window_cost)
if baseline_hours:
baseline_avg = sum(baseline_hours) / len(baseline_hours)
if baseline_avg > 0 and hour_cost > baseline_avg * self.spike_multiplier:
alerts.append(
f"LLM cost anomaly: ${hour_cost:.2f} this hour vs "
f"${baseline_avg:.2f} baseline ({hour_cost/baseline_avg:.1f}× normal)"
)
return alerts
Step 3 — Detect prompt content anomalies without storing content
Monitor prompt entropy and structure without logging sensitive content:
import re
import math
from collections import Counter
def analyse_prompt_safely(prompt_text: str) -> dict:
"""Extract security-relevant statistics from a prompt without storing it."""
# Shannon entropy — high entropy may indicate encoded/obfuscated content
chars = Counter(prompt_text)
length = len(prompt_text)
entropy = -sum((c/length) * math.log2(c/length) for c in chars.values())
# Structural indicators
has_base64 = bool(re.search(r'[A-Za-z0-9+/]{50,}={0,2}', prompt_text))
has_url = bool(re.search(r'https?://', prompt_text))
has_injection_pattern = bool(re.search(
r'ignore previous|system prompt|jailbreak|DAN|you are now|override',
prompt_text, re.IGNORECASE
))
line_count = prompt_text.count('\n')
# Token count estimate (rough)
estimated_tokens = len(prompt_text.split()) * 1.3
return {
"length": length,
"entropy": entropy,
"estimated_tokens": estimated_tokens,
"has_base64_blob": has_base64,
"has_external_url": has_url,
"has_injection_pattern": has_injection_pattern,
"line_count": line_count,
# Deliberately NOT: prompt_text itself
}
Step 4 — Alert rules for your SIEM
# Prometheus / alertmanager rules for LLM API abuse detection
groups:
- name: llm_api_abuse
rules:
# Sudden cost spike
- alert: LLMAPIKeyCostSpike
expr: |
sum(rate(llm_api_call_cost_usd_total[1h])) * 3600 > 10
labels:
severity: warning
annotations:
summary: "LLM API cost exceeding $10/hour"
description: "Current hourly cost: ${{ $value | printf \"%.2f\" }}"
# New model being called (unauthorized model use)
- alert: LLMUnexpectedModelUsed
expr: |
count by (model) (
increase(llm_api_calls_total[5m])
) unless on(model) (
llm_api_calls_total offset 1d > 0
) > 0
labels:
severity: warning
annotations:
summary: "New LLM model in use: {{ $labels.model }}"
# Injection pattern detected in prompts
- alert: LLMPromptInjectionAttempt
expr: |
sum(increase(llm_prompt_injection_detected_total[5m])) > 0
labels:
severity: critical
annotations:
summary: "Prompt injection pattern detected in LLM API calls"
# Very high entropy prompts (possible encoded payload)
- alert: LLMHighEntropyPrompt
expr: |
sum(increase(llm_high_entropy_prompts_total[5m])) > 5
labels:
severity: warning
annotations:
summary: "Multiple high-entropy prompts detected — possible obfuscated content"
Step 5 — Rotate keys on anomaly detection
# When abuse is detected, rotate the compromised key immediately
# This script integrates with your secrets manager
#!/bin/bash
# rotate-llm-key.sh — emergency key rotation on abuse detection
SERVICE=$1
OLD_KEY_SECRET_NAME="llm-api-key-${SERVICE}"
echo "Rotating LLM API key for service: $SERVICE"
# 1. Generate new key from provider (provider-specific)
# For Anthropic: done via https://console.anthropic.com/account/keys
# Store new key in secrets manager
NEW_KEY=$(read -sp "Enter new API key: "; echo $REPLY)
# 2. Update in secrets manager
aws secretsmanager update-secret \
--secret-id "$OLD_KEY_SECRET_NAME" \
--secret-string "$NEW_KEY"
# 3. Trigger rolling restart of affected services to pick up new key
kubectl rollout restart deployment/"$SERVICE" -n production
# 4. Revoke old key at provider (manual step — provider dashboard required)
echo "MANUAL STEP: Revoke the old key at the provider dashboard"
echo "Anthropic: https://console.anthropic.com/account/keys"
# 5. Log the rotation event
aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=UpdateSecret 2>/dev/null | head -5
Expected Behaviour
| Abuse scenario | Without detection | With detection |
|---|---|---|
| Stolen key used for resale | Discovered at month-end invoice | Cost spike alert fires within 1 hour |
| Prompt injection attempt | No alert | Injection pattern counter increments; alert fires |
| Key used from unexpected IP | No visibility | Source IP not in baseline; anomaly logged |
| New model call (Claude Opus instead of Haiku) | No alert | Unexpected model alert fires |
| High-entropy (encoded) prompt | No visibility | High-entropy counter flagged; analyst review queued |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Log metadata without prompt content | Protects privacy of prompt data | Less context for investigating abuse | On confirmed abuse incident, enable temporary prompt logging with security team approval and strict retention |
| Cost spike threshold at $10/hour | Catches most credential misuse quickly | May alert on legitimate batch processing | Separate alert thresholds per application; batch jobs get higher threshold |
| Injection pattern detection | Flags reconnaissance | High false positive rate if app handles user input with natural injection-like language | Tune patterns to the specific attack patterns that matter; measure false positive rate |
| Prompt entropy analysis | Catches obfuscated payloads | High entropy is not uniquely malicious (code, base64 attachments) | Use as one signal among many; require multiple signals to fire an alert |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Alert threshold too high | Abuse occurs for hours before alert fires | Review historical spend at alert time | Lower threshold; establish tighter per-key hourly limits via provider settings (Anthropic usage limits, OpenAI spend limits) |
| Monitoring proxy adds latency | Application response time increases | P99 latency spike in application metrics | Optimise proxy to be async for logging; use sampling instead of 100% capture |
| Key rotation breaks service before new key propagates | Service returns 401 after rotation | Health check fails immediately post-rotation | Implement graceful rotation: provision new key, update service, wait for health check, revoke old key |
| Baseline not established for new service | First week generates many false positives | High alert volume from new service | Suppress anomaly alerts for 7 days after service launch; establish baseline before enabling anomaly rules |
Related Articles
- LLM API Security — transport security, authentication, and rate limiting for LLM API access
- MLOps Secrets Management — storing and rotating LLM API keys as part of the broader ML secrets programme
- LLM Rate Limiting — rate limiting LLM API calls at the infrastructure layer to limit blast radius from credential compromise
- Secret Access Anomaly Detection — detecting anomalous credential use patterns, of which LLM key abuse is a specific instance
- Cloud Audit Log Tampering Detection — protecting the audit logs that record LLM API key usage