NIST AI RMF 1.0 Technical Security Controls for Production AI Systems

The Problem

The NIST AI Risk Management Framework (AI RMF 1.0, published January 2023) provides a vocabulary and process structure for managing risks from AI systems, but it is deliberately technology-agnostic. The framework describes what organisations should do — govern AI deployment decisions, map the context and harms, measure risk indicators, and manage identified risks — without specifying which technical controls implement each activity.

This creates a gap for practitioners. Security and platform engineers tasked with “implementing the AI RMF” often produce governance documentation (policies, roles, impact assessments) while leaving the technical substrate — model access controls, output monitoring, drift detection, incident response automation — either absent or handled ad hoc by ML engineers who are not thinking in terms of the RMF. When an audit or regulatory review asks “how does your organisation detect and respond to harmful AI outputs?”, the answer is usually a policy document with no underlying technical evidence.

The gap matters because the AI RMF is increasingly referenced in regulatory contexts. The EU AI Act (enforcement from August 2026) maps closely to RMF concepts for high-risk AI systems, and the FTC’s algorithmic accountability guidance cites RMF practices. Organisations that have operationalised the RMF technically — with tooling, metrics, and runbooks — are audit-ready; those that have only the documentation are not.

This guide maps the RMF’s four core functions to specific technical controls, organised by function and sub-function. Each control is described with an implementation example and a measurable indicator.

Target systems: Production LLM-based applications (RAG pipelines, chat interfaces, decision-support systems); model serving infrastructure (vLLM, TGI, Bedrock, Azure OpenAI); organisations subject to EU AI Act high-risk requirements or seeking SOC 2 coverage for AI components.

Threat Model

For the purposes of AI RMF technical implementation, the relevant risk classes are:

1. Model output risk (the AI system produces harmful, biased, or incorrect outputs). Objective: detect and contain outputs before they cause harm. Technical control: output monitoring with configurable blocklists and semantic classifiers.

2. Data pipeline risk (training or retrieval data is poisoned, biased, or unauthorised). Objective: validate data provenance and content before it influences model behaviour. Technical control: data lineage tracking and content scanning at ingestion.

3. Access and misuse risk (authorised or unauthorised users extract sensitive information or misuse the AI system). Objective: enforce least-privilege access to model capabilities; detect misuse patterns. Technical control: API authentication, per-user rate limiting, and anomaly detection on usage patterns.

4. Operational drift risk (model behaviour changes over time due to model updates, prompt changes, or distribution shift in inputs). Objective: detect when AI system behaviour has deviated from its tested baseline. Technical control: automated evaluation pipelines and performance regression alerting.

Hardening Configuration

GOVERN Function: Technical Controls

G1: Establish AI risk policies with technical enforcement points

# policy-as-code for AI deployment (example: OPA policy)
# Enforces that every AI system in production has:
# - An impact assessment on record
# - Output monitoring enabled
# - An incident response runbook

package ai.governance

deny[msg] {
  input.resource.type == "ai_model_deployment"
  not input.resource.annotations["ai.rmf/impact-assessment-date"]
  msg := sprintf("Deployment %v lacks required impact assessment annotation",
    [input.resource.name])
}

deny[msg] {
  input.resource.type == "ai_model_deployment"
  not input.resource.annotations["ai.rmf/output-monitoring-enabled"]
  msg := sprintf("Deployment %v does not have output monitoring enabled",
    [input.resource.name])
}

G2: Assign AI risk owners with technical accountability

# Tag every AI-related resource (models, endpoints, data stores) with owner
# Use cloud provider resource tags or Kubernetes labels

kubectl label deployment my-llm-api \
  ai.rmf/owner="team-platform@example.com" \
  ai.rmf/risk-tier="high" \
  ai.rmf/impact-assessment-date="2026-06-01" \
  ai.rmf/output-monitoring="enabled" \
  ai.rmf/last-evaluated="2026-06-01"

MAP Function: Technical Controls

M1: Document AI system boundaries and data flows

Use automated discovery to enumerate AI system components:

# Scan for AI-related deployments across namespaces
kubectl get deployments --all-namespaces -o json | \
  jq '.items[] | select(.spec.template.spec.containers[].image |
    test("vllm|tgi|ollama|openai|anthropic|llama")) |
  {namespace: .metadata.namespace, name: .metadata.name,
   images: [.spec.template.spec.containers[].image]}'

# Document model artifact provenance
gcloud artifacts docker images list \
  REGION-docker.pkg.dev/PROJECT_ID/ai-models/ \
  --include-tags --format=json | \
  jq '.[] | {image: .package, tag: .tags, created: .createTime}'

M2: Map AI system inputs and outputs for harm identification

# Structured logging for AI system I/O — feeds MAP/MEASURE functions
import structlog

log = structlog.get_logger()

async def log_ai_interaction(
    request_id: str,
    user_id_hash: str,    # Never log raw user IDs
    input_length: int,
    input_categories: list[str],   # Output of a content classifier
    output_length: int,
    output_categories: list[str],  # Harmful content categories detected
    model_id: str,
    latency_ms: float,
) -> None:
    log.info(
        "ai_interaction",
        request_id=request_id,
        user_id_hash=user_id_hash,
        input_length=input_length,
        input_categories=input_categories,
        output_length=output_length,
        output_categories=output_categories,
        model_id=model_id,
        latency_ms=latency_ms,
        rmf_function="MAP",
    )

MEASURE Function: Technical Controls

ME1: Automated evaluation pipelines (continuous testing)

# evaluation_pipeline.py — runs on every model deployment
# Implements MEASURE function continuous evaluation

from datasets import load_dataset
import openai

EVAL_DATASET = "internal-eval-set-v3"
METRICS = {
    "harmful_output_rate": 0.001,    # Must be below 0.1%
    "factual_accuracy":    0.92,     # Must be above 92%
    "response_refusal_rate": 0.05,   # Must be below 5% (avoid over-refusal)
    "latency_p99_ms":      2000,     # Must be below 2s
}

async def run_evaluation(model_endpoint: str) -> dict[str, float]:
    dataset = load_dataset(EVAL_DATASET, split="test")
    results = {}

    harmful_count = 0
    for sample in dataset:
        response = await call_model(model_endpoint, sample["prompt"])
        if is_harmful(response):
            harmful_count += 1

    results["harmful_output_rate"] = harmful_count / len(dataset)
    # ... other metrics

    return results

def check_metrics_pass(results: dict) -> bool:
    for metric, threshold in METRICS.items():
        if metric in ["harmful_output_rate", "latency_p99_ms", "response_refusal_rate"]:
            if results[metric] > threshold:
                print(f"FAIL: {metric}={results[metric]} exceeds threshold {threshold}")
                return False
        else:
            if results[metric] < threshold:
                print(f"FAIL: {metric}={results[metric]} below threshold {threshold}")
                return False
    return True

ME2: Drift detection for production inputs

# Monitor distribution of inputs to detect drift that may indicate misuse
# or a change in user population that invalidates impact assessments

from collections import Counter
import hashlib

class InputDriftMonitor:
    def __init__(self, baseline_distribution: dict):
        self._baseline = baseline_distribution
        self._current_window: Counter = Counter()
        self._window_size = 1000

    def record(self, input_text: str) -> None:
        # Use content classifier categories, not raw text
        categories = classify_content(input_text)
        for cat in categories:
            self._current_window[cat] += 1

        if sum(self._current_window.values()) >= self._window_size:
            self._check_drift()
            self._current_window.clear()

    def _check_drift(self) -> None:
        total = sum(self._current_window.values())
        for category, baseline_fraction in self._baseline.items():
            current_fraction = self._current_window[category] / total
            # Alert if distribution has shifted more than 2x baseline
            if abs(current_fraction - baseline_fraction) > baseline_fraction:
                alert(
                    "ai_input_distribution_drift",
                    category=category,
                    baseline=baseline_fraction,
                    current=current_fraction,
                )

ME3: Bias and fairness metrics

# For AI systems making decisions that affect users, measure outcome disparities
# across demographic groups (using proxy attributes if direct attributes unavailable)

def compute_demographic_parity(decisions: list[dict]) -> dict:
    """
    Compute acceptance rate by demographic group proxy.
    RMF MEASURE function: track fairness metrics over time.
    """
    by_group: dict[str, list[int]] = {}
    for decision in decisions:
        group = decision.get("user_region", "unknown")
        outcome = 1 if decision["ai_decision"] == "approve" else 0
        by_group.setdefault(group, []).append(outcome)

    parity = {}
    overall_rate = sum(sum(v) for v in by_group.values()) / sum(len(v) for v in by_group.values())
    for group, outcomes in by_group.items():
        group_rate = sum(outcomes) / len(outcomes)
        parity[group] = {
            "approval_rate": group_rate,
            "disparity_from_overall": group_rate - overall_rate,
        }
    return parity

MANAGE Function: Technical Controls

MN1: Incident response automation for AI harms

# AlertManager rule: trigger AI harm incident response
# feeds into MANAGE function automated response

groups:
  - name: ai-harm-detection
    rules:
      - alert: AiHarmfulOutputRateHigh
        expr: |
          rate(ai_harmful_outputs_total[5m]) /
          rate(ai_requests_total[5m]) > 0.01
        for: 2m
        labels:
          severity: critical
          rmf_function: MANAGE
        annotations:
          summary: "AI harmful output rate exceeds 1%"
          runbook_url: "https://wiki.internal/runbooks/ai-harm-response"
          action: >
            1. Page on-call ML engineer and security engineer.
            2. If rate > 5%: trigger circuit breaker to divert traffic to fallback model.
            3. Sample 50 recent harmful outputs for root cause analysis.
            4. Open incident ticket with model ID, time range, and sample.

MN2: Circuit breaker for AI model endpoints

# Implement circuit breaker pattern for AI endpoints
# RMF MANAGE: ability to contain AI harm quickly

import asyncio
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"       # Normal operation
    OPEN = "open"           # AI endpoint blocked; fallback active
    HALF_OPEN = "half_open" # Testing recovery

class AICircuitBreaker:
    def __init__(self, failure_threshold: float = 0.02, recovery_timeout: int = 300):
        self.state = CircuitState.CLOSED
        self._failure_threshold = failure_threshold
        self._failure_count = 0
        self._request_count = 0
        self._recovery_timeout = recovery_timeout

    async def call(self, fn, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            return await self._fallback_response()

        try:
            result = await fn(*args, **kwargs)
            if is_harmful(result):
                self._record_failure()
            else:
                self._record_success()
            return result
        except Exception:
            self._record_failure()
            raise

    def _record_failure(self):
        self._failure_count += 1
        self._request_count += 1
        rate = self._failure_count / self._request_count
        if rate > self._failure_threshold and self._request_count > 100:
            self.state = CircuitState.OPEN
            log.warning("ai_circuit_breaker_open", failure_rate=rate)
            asyncio.get_event_loop().call_later(
                self._recovery_timeout, self._attempt_recovery
            )

MN3: Model version rollback automation

# Kubernetes deployment rollback for AI model updates
# Triggered by MANAGE function incident response

#!/bin/bash
# rollback-ai-model.sh — called by incident response automation

DEPLOYMENT_NAME=$1
NAMESPACE=${2:-ai-production}

# Record current state before rollback
kubectl rollout history deployment/${DEPLOYMENT_NAME} -n ${NAMESPACE} > \
  /tmp/rollback-state-$(date +%Y%m%d%H%M%S).txt

# Rollback to previous version
kubectl rollout undo deployment/${DEPLOYMENT_NAME} -n ${NAMESPACE}

# Wait for rollback to complete
kubectl rollout status deployment/${DEPLOYMENT_NAME} -n ${NAMESPACE} --timeout=120s

# Verify harmful output rate drops
sleep 30
RATE=$(kubectl exec -n monitoring deployment/prometheus -- \
  promtool query instant \
  'rate(ai_harmful_outputs_total{deployment="'${DEPLOYMENT_NAME}'"}[5m]) /
   rate(ai_requests_total{deployment="'${DEPLOYMENT_NAME}'"}[5m])')

echo "Post-rollback harmful output rate: ${RATE}"

Expected Behaviour After Hardening

RMF Function	Before	After
GOVERN — deployment gates	No policy enforcement; any image can deploy	OPA policy rejects deployments without impact assessment annotation
MAP — I/O documentation	Manual wiki pages, often stale	Automated discovery + structured I/O logging feeds up-to-date model registry
MEASURE — evaluation	Ad hoc manual testing at release time	Automated eval pipeline runs on every deployment; failures block promotion
MEASURE — drift detection	None; no alert if input population shifts	Distribution drift monitor alerts within 1,000 requests
MANAGE — incident response	Manual paging; no containment automation	AlertManager triggers runbook; circuit breaker activates at 2% harmful rate

Verification:

# Confirm governance policy is enforced
kubectl apply --dry-run=server -f deploy-without-annotations.yaml
# Expected: OPA policy denial

# Confirm evaluation pipeline ran on last deployment
kubectl get job -n ai-ci -l rmf.function=MEASURE --sort-by=.metadata.creationTimestamp | tail -1
# Expected: recently-run COMPLETED job

# Confirm circuit breaker status
kubectl get configmap ai-circuit-breaker-state -n ai-production -o yaml
# Expected: state: closed (healthy)

Trade-offs and Operational Considerations

Aspect	Benefit	Cost	Mitigation
Automated evaluation pipelines	Continuous evidence of MEASURE compliance	Eval dataset must be maintained; can become a gaming target	Rotate eval sets regularly; keep a held-out test set that CI never sees
Policy-as-code deployment gates	Audit-ready; provable governance	Annotation drift if teams forget to update; gates can slow emergency deploys	Automate annotation from CI; document break-glass procedure
Circuit breaker for AI harms	Fast containment of harmful output events	Fallback model may have lower capability; users see degraded service	Define fallback model in advance; test fallback path regularly
Structured I/O logging	Feeds MAP and MEASURE; supports retrospective analysis	Logs contain potentially sensitive model outputs	Apply data minimisation: log categories and lengths, not raw content
Drift detection	Early warning of population or model shift	Baseline must be established; alert tuning required	Run drift monitor in alert-only mode for 30 days before enabling paging

Failure Modes

Failure	Symptom	Detection	Recovery
Eval pipeline fails without blocking deployment	Harmful outputs reach production undetected	Post-hoc harm metrics spike; retrospective review	Make eval pipeline a required CI gate; fail deployment on eval pipeline failure
OPA policy too strict	Emergency deployments blocked	Deployment queue backs up; engineers bypass gate	Add break-glass annotation with time-limit; log all break-glass uses
Circuit breaker trips on false positive (benign unusual inputs)	Legitimate traffic routed to fallback model	User-facing capability degradation; circuit breaker status dashboard	Tune harmful output classifier; add human-review override for circuit breaker
Drift monitor baseline is wrong	Persistent false alerts or missed real drift	Alert noise leads to suppression; real drift missed	Re-baseline from a representative 30-day production window
Structured I/O logs contain PII despite filtering	Privacy incident if logs exfiltrated	PII scanner on log output; privacy audit	Re-evaluate I/O logging data minimisation; add PII scrubbing at log sink