Detecting Abuse of LLM API Keys and Inference Endpoints

Detecting Abuse of LLM API Keys and Inference Endpoints

Problem

LLM API credentials are a distinct credential class that most organisations’ secret management and abuse detection programmes were not designed for. Traditional API key abuse manifests as unauthorised access to data or computation. LLM API key abuse manifests as all of that, plus several patterns unique to language model inference:

Cost-generating inference. A stolen LLM API key can generate thousands of dollars in API costs per hour. Unlike a compromised database credential — which exposes existing data — a compromised LLM key actively generates cost. An attacker who steals an Anthropic, OpenAI, or Gemini API key can sell inference-as-a-service, use it for their own applications, or run compute-intensive batch jobs. The cost appears on the victim’s invoice, not immediately in any security alert.

Data exfiltration via prompt content. Applications that pass sensitive data into LLM prompts (customer PII, internal documents, database query results, user communications) create a vector where a compromised API proxy or a man-in-the-middle on the API call path can capture prompt content. Unlike HTTPS traffic interception, LLM proxy logs routinely contain the full prompt text. A misconfigured LiteLLM proxy, a compromised logging pipeline, or a malicious middleware layer can silently capture all prompt content.

Competitive intelligence extraction. An attacker with access to an organisation’s LLM API key and prompt templates can reconstruct the organisation’s internal workflows, proprietary prompts, system instructions, and business logic. These are encoded in system prompts that the application sends with every API call.

Prompt scanning for injection payloads. Attackers who target AI applications probe them by sending crafted API requests — injection attempts, jailbreak payloads, boundary tests — to discover exploitable patterns. A burst of unusual prompt content from an unexpected source IP is a reconnaissance indicator.

The monitoring gap is that most organisations treat LLM API keys the same way they treat any other API key: rotate them periodically, store them in a secrets manager, and alert if they appear in source code. None of these controls address the real-time abuse detection problem: how do you know within minutes (not months) that your LLM API key is being misused?

Existing LLM API usage dashboards (Anthropic Console, OpenAI Usage) provide aggregate spend and call counts, but they lack: per-key breakdown by source IP, prompt content anomaly detection, real-time alerting at thresholds below the billing cycle, or integration with your SIEM.

Target systems: any organisation with Anthropic, OpenAI, Gemini, or Cohere API keys; organisations running LiteLLM or similar LLM proxies; any application that passes sensitive data in LLM prompts; ML engineering teams responsible for model serving and API key management.


Threat Model

Adversary 1 — Stolen key sold for inference resale. A developer commits an Anthropic API key to a public GitHub repository. A bot discovers it within seconds. The key is sold on underground forums. Multiple actors begin using it simultaneously for their own applications. Usage spikes 100× in an hour; the organisation receives a $10,000+ invoice.

Adversary 2 — Prompt content capture via misconfigured proxy. An organisation routes all LLM API calls through LiteLLM. LiteLLM’s request logging is enabled and writes to a shared logging infrastructure with insufficient access controls. An attacker with read access to the logging pipeline reads all prompts, including those containing customer PII, internal documents, and proprietary business logic.

Adversary 3 — System prompt reconstruction. An attacker who discovers the organisation’s LLM API base URL sends API calls mimicking a legitimate client. They vary the prompt content while keeping the system prompt constant (the application sends the same system prompt with every call). By analysing responses, they reconstruct the system prompt content and the application’s intended behaviour.

Adversary 4 — Credential stuffing for LLM access. An attacker obtains a list of leaked API keys from previous breaches and tests them against LLM provider endpoints. Valid keys are used for inference until they are revoked. Most organisations don’t notice until billing.


Configuration / Implementation

Step 1 — Deploy a proxy that captures usage metadata for monitoring

Route all LLM API calls through a proxy that logs usage metadata (without capturing prompt content by default):

# llm_proxy_monitor.py
# Minimal monitoring wrapper for Anthropic API calls
# Logs metadata WITHOUT storing prompt content

import anthropic
import hashlib
import time
from dataclasses import dataclass
from typing import Optional
import logging

logger = logging.getLogger("llm_monitor")

@dataclass
class CallMetadata:
    timestamp: float
    model: str
    input_tokens: int
    output_tokens: int
    cost_usd: float
    source_service: str
    source_ip: Optional[str]
    prompt_hash: str          # SHA-256 of prompt — for anomaly detection without storing content
    prompt_length: int
    has_system_prompt: bool
    # Deliberately NOT: prompt_content, response_content

# Approximate costs (update when pricing changes)
COST_PER_TOKEN = {
    "claude-sonnet-4-6": {"input": 0.000003, "output": 0.000015},
    "claude-haiku-4-5-20251001": {"input": 0.00000025, "output": 0.00000125},
}

class MonitoredAnthropicClient:
    """Anthropic client wrapper that logs call metadata for abuse detection."""
    
    def __init__(self, api_key: str, service_name: str):
        self._client = anthropic.Anthropic(api_key=api_key)
        self.service_name = service_name
        self._call_log: list[CallMetadata] = []
    
    def messages_create(self, **kwargs) -> anthropic.types.Message:
        start = time.time()
        
        # Hash prompt for anomaly detection (not storage)
        messages_str = str(kwargs.get("messages", []))
        system_str = str(kwargs.get("system", ""))
        prompt_hash = hashlib.sha256(f"{system_str}{messages_str}".encode()).hexdigest()
        has_system = bool(kwargs.get("system"))
        prompt_len = len(messages_str) + len(system_str)
        
        response = self._client.messages.create(**kwargs)
        
        model = kwargs.get("model", "unknown")
        costs = COST_PER_TOKEN.get(model, {"input": 0, "output": 0})
        cost = (response.usage.input_tokens * costs["input"] + 
                response.usage.output_tokens * costs["output"])
        
        metadata = CallMetadata(
            timestamp=start,
            model=model,
            input_tokens=response.usage.input_tokens,
            output_tokens=response.usage.output_tokens,
            cost_usd=cost,
            source_service=self.service_name,
            source_ip=None,  # Set by calling context if available
            prompt_hash=prompt_hash,
            prompt_length=prompt_len,
            has_system_prompt=has_system,
        )
        
        self._call_log.append(metadata)
        self._emit_metrics(metadata)
        
        return response
    
    def _emit_metrics(self, m: CallMetadata) -> None:
        """Emit structured log for SIEM ingestion — no prompt content."""
        logger.info({
            "event": "llm_api_call",
            "timestamp": m.timestamp,
            "service": m.source_service,
            "model": m.model,
            "input_tokens": m.input_tokens,
            "output_tokens": m.output_tokens,
            "cost_usd": m.cost_usd,
            "prompt_hash": m.prompt_hash,
            "prompt_length": m.prompt_length,
            "has_system_prompt": m.has_system_prompt,
        })

Step 2 — Implement cost spike alerting

# cost_monitor.py
# Real-time cost alerting — fires before the monthly bill

from collections import defaultdict
from datetime import datetime, timedelta

class LLMCostMonitor:
    """Monitor LLM API costs and alert on anomalies."""
    
    def __init__(
        self,
        hourly_alert_threshold_usd: float = 10.0,
        daily_alert_threshold_usd: float = 50.0,
        spike_multiplier: float = 5.0,  # Alert if current hour > 5× baseline
    ):
        self.hourly_threshold = hourly_alert_threshold_usd
        self.daily_threshold = daily_alert_threshold_usd
        self.spike_multiplier = spike_multiplier
        self._hourly_costs: list[tuple[datetime, float]] = []
    
    def record_call(self, cost_usd: float, timestamp: datetime) -> list[str]:
        """Record a call and return any triggered alerts."""
        self._hourly_costs.append((timestamp, cost_usd))
        # Clean old records
        cutoff = timestamp - timedelta(days=7)
        self._hourly_costs = [(t, c) for t, c in self._hourly_costs if t > cutoff]
        
        return self._check_alerts(timestamp)
    
    def _check_alerts(self, now: datetime) -> list[str]:
        alerts = []
        
        # Current hour cost
        hour_ago = now - timedelta(hours=1)
        hour_cost = sum(c for t, c in self._hourly_costs if t > hour_ago)
        if hour_cost > self.hourly_threshold:
            alerts.append(
                f"LLM cost spike: ${hour_cost:.2f} in last hour "
                f"(threshold: ${self.hourly_threshold})"
            )
        
        # Check for spike vs baseline (last 7 days same hour)
        baseline_hours = []
        for day_offset in range(1, 8):
            window_start = now - timedelta(days=day_offset, hours=1)
            window_end = now - timedelta(days=day_offset)
            window_cost = sum(
                c for t, c in self._hourly_costs 
                if window_start < t < window_end
            )
            if window_cost > 0:
                baseline_hours.append(window_cost)
        
        if baseline_hours:
            baseline_avg = sum(baseline_hours) / len(baseline_hours)
            if baseline_avg > 0 and hour_cost > baseline_avg * self.spike_multiplier:
                alerts.append(
                    f"LLM cost anomaly: ${hour_cost:.2f} this hour vs "
                    f"${baseline_avg:.2f} baseline ({hour_cost/baseline_avg:.1f}× normal)"
                )
        
        return alerts

Step 3 — Detect prompt content anomalies without storing content

Monitor prompt entropy and structure without logging sensitive content:

import re
import math
from collections import Counter

def analyse_prompt_safely(prompt_text: str) -> dict:
    """Extract security-relevant statistics from a prompt without storing it."""
    
    # Shannon entropy — high entropy may indicate encoded/obfuscated content
    chars = Counter(prompt_text)
    length = len(prompt_text)
    entropy = -sum((c/length) * math.log2(c/length) for c in chars.values())
    
    # Structural indicators
    has_base64 = bool(re.search(r'[A-Za-z0-9+/]{50,}={0,2}', prompt_text))
    has_url = bool(re.search(r'https?://', prompt_text))
    has_injection_pattern = bool(re.search(
        r'ignore previous|system prompt|jailbreak|DAN|you are now|override',
        prompt_text, re.IGNORECASE
    ))
    line_count = prompt_text.count('\n')
    
    # Token count estimate (rough)
    estimated_tokens = len(prompt_text.split()) * 1.3
    
    return {
        "length": length,
        "entropy": entropy,
        "estimated_tokens": estimated_tokens,
        "has_base64_blob": has_base64,
        "has_external_url": has_url,
        "has_injection_pattern": has_injection_pattern,
        "line_count": line_count,
        # Deliberately NOT: prompt_text itself
    }

Step 4 — Alert rules for your SIEM

# Prometheus / alertmanager rules for LLM API abuse detection

groups:
- name: llm_api_abuse
  rules:
  # Sudden cost spike
  - alert: LLMAPIKeyCostSpike
    expr: |
      sum(rate(llm_api_call_cost_usd_total[1h])) * 3600 > 10
    labels:
      severity: warning
    annotations:
      summary: "LLM API cost exceeding $10/hour"
      description: "Current hourly cost: ${{ $value | printf \"%.2f\" }}"

  # New model being called (unauthorized model use)
  - alert: LLMUnexpectedModelUsed
    expr: |
      count by (model) (
        increase(llm_api_calls_total[5m])
      ) unless on(model) (
        llm_api_calls_total offset 1d > 0
      ) > 0
    labels:
      severity: warning
    annotations:
      summary: "New LLM model in use: {{ $labels.model }}"

  # Injection pattern detected in prompts
  - alert: LLMPromptInjectionAttempt
    expr: |
      sum(increase(llm_prompt_injection_detected_total[5m])) > 0
    labels:
      severity: critical
    annotations:
      summary: "Prompt injection pattern detected in LLM API calls"

  # Very high entropy prompts (possible encoded payload)
  - alert: LLMHighEntropyPrompt
    expr: |
      sum(increase(llm_high_entropy_prompts_total[5m])) > 5
    labels:
      severity: warning
    annotations:
      summary: "Multiple high-entropy prompts detected — possible obfuscated content"

Step 5 — Rotate keys on anomaly detection

# When abuse is detected, rotate the compromised key immediately
# This script integrates with your secrets manager

#!/bin/bash
# rotate-llm-key.sh — emergency key rotation on abuse detection

SERVICE=$1
OLD_KEY_SECRET_NAME="llm-api-key-${SERVICE}"

echo "Rotating LLM API key for service: $SERVICE"

# 1. Generate new key from provider (provider-specific)
# For Anthropic: done via https://console.anthropic.com/account/keys
# Store new key in secrets manager
NEW_KEY=$(read -sp "Enter new API key: "; echo $REPLY)

# 2. Update in secrets manager
aws secretsmanager update-secret \
  --secret-id "$OLD_KEY_SECRET_NAME" \
  --secret-string "$NEW_KEY"

# 3. Trigger rolling restart of affected services to pick up new key
kubectl rollout restart deployment/"$SERVICE" -n production

# 4. Revoke old key at provider (manual step — provider dashboard required)
echo "MANUAL STEP: Revoke the old key at the provider dashboard"
echo "Anthropic: https://console.anthropic.com/account/keys"

# 5. Log the rotation event
aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=UpdateSecret 2>/dev/null | head -5

Expected Behaviour

Abuse scenario Without detection With detection
Stolen key used for resale Discovered at month-end invoice Cost spike alert fires within 1 hour
Prompt injection attempt No alert Injection pattern counter increments; alert fires
Key used from unexpected IP No visibility Source IP not in baseline; anomaly logged
New model call (Claude Opus instead of Haiku) No alert Unexpected model alert fires
High-entropy (encoded) prompt No visibility High-entropy counter flagged; analyst review queued

Trade-offs

Aspect Benefit Cost Mitigation
Log metadata without prompt content Protects privacy of prompt data Less context for investigating abuse On confirmed abuse incident, enable temporary prompt logging with security team approval and strict retention
Cost spike threshold at $10/hour Catches most credential misuse quickly May alert on legitimate batch processing Separate alert thresholds per application; batch jobs get higher threshold
Injection pattern detection Flags reconnaissance High false positive rate if app handles user input with natural injection-like language Tune patterns to the specific attack patterns that matter; measure false positive rate
Prompt entropy analysis Catches obfuscated payloads High entropy is not uniquely malicious (code, base64 attachments) Use as one signal among many; require multiple signals to fire an alert

Failure Modes

Failure Symptom Detection Recovery
Alert threshold too high Abuse occurs for hours before alert fires Review historical spend at alert time Lower threshold; establish tighter per-key hourly limits via provider settings (Anthropic usage limits, OpenAI spend limits)
Monitoring proxy adds latency Application response time increases P99 latency spike in application metrics Optimise proxy to be async for logging; use sampling instead of 100% capture
Key rotation breaks service before new key propagates Service returns 401 after rotation Health check fails immediately post-rotation Implement graceful rotation: provision new key, update service, wait for health check, revoke old key
Baseline not established for new service First week generates many false positives High alert volume from new service Suppress anomaly alerts for 7 days after service launch; establish baseline before enabling anomaly rules