AI Security Posture Management: Extending CSPM to ML Infrastructure

Problem

Cloud Security Posture Management (CSPM) tools — AWS Security Hub, GCP Security Command Centre, Prisma Cloud, Wiz — have matured into comprehensive scanners for traditional cloud infrastructure. They check S3 bucket policies, IAM permission breadth, unencrypted databases, and exposed security group rules. For most cloud workloads, a CSPM tool provides reasonable baseline coverage.

AI and ML infrastructure introduces a layer of attack surface that existing CSPM tools are almost entirely blind to. The tools were not designed for it, the check libraries don’t include it, and the ML engineering teams who deploy this infrastructure are rarely thinking about it as a security surface. The result is AI infrastructure that passes a CSPM scan cleanly while hosting several critical misconfigurations.

The specific AI/ML attack surface that standard CSPM misses:

Unauthenticated model serving endpoints. Frameworks like vLLM, Ollama, Ray Serve, and NVIDIA Triton start with no authentication by default. Engineers who deploy these for internal use frequently leave them accessible without credentials. Unlike a database, these are not flagged by CSPM as “unauthenticated” because CSPM tools don’t understand the model serving API. The exposed endpoint allows anyone who can reach it to extract the model, enumerate its capabilities, run inference at your cost, and potentially extract training data via membership inference.

Unencrypted model weight storage. Model checkpoints and fine-tuned weights stored in S3, GCS, or Azure Blob are frequently stored without server-side encryption, or with encryption that is accessible to overly broad IAM roles. Model weights represent significant IP value — potentially months of compute investment — and their exfiltration may be unnoticed for extended periods.

Over-permissioned MLflow and experiment tracking. MLflow, Weights & Biases, and similar experiment tracking tools are often deployed with admin credentials shared across the team, no audit logging, and access to production model versions. Compromise of the experiment tracking service gives an attacker access to all model artifacts and the ability to push a malicious model version.

GPU node host path mounts. AI training jobs frequently require access to GPU drivers and device files via host path mounts. When these mounts are too broad, they expose the host filesystem to the container, enabling privilege escalation from the training job.

Jupyter notebook servers with no authentication. Jupyter notebooks remain the most common entry point for ML engineers. Default Jupyter deployments have no authentication, no TLS, and run as the notebook server’s user (often with broad IAM permissions for accessing training data and model registries).

Model serving without rate limiting. Production inference endpoints without rate limiting are vulnerable to cost-based denial of service: an attacker who discovers the endpoint can exhaust your GPU compute budget or run inference requests to extract model capabilities.

Training data buckets with overly broad access policies. Training datasets, especially those containing PII, are frequently stored in buckets that are accessible to any service account in the data science team rather than scoped to specific training jobs.

None of these misconfigurations appear in standard CSPM output. Building AI security posture management requires either extending your existing CSPM tool with custom checks, deploying AI-specific scanning tooling, or writing your own checks against your inventory.

Target systems: any cloud environment running ML training or inference workloads; Kubernetes clusters with GPU nodes and AI frameworks deployed; teams using MLflow, W&B, or similar experiment tracking; organisations where ML engineers deploy infrastructure independently of a platform team.

Threat Model

Adversary 1 — External discovery of unauthenticated model endpoint. Internet or internal scanner discovers an unauthenticated Ollama or vLLM endpoint. Attacker runs inference at no cost, extracts model capabilities, and attempts model inversion to recover training data. Cost: your GPU bill spikes without alert.

Adversary 2 — MLflow model substitution. Attacker with access to an over-permissioned MLflow server (shared admin credential, no MFA) promotes a poisoned model version to production. The production model serving pipeline pulls the new version on next restart. Inference outputs are now attacker-influenced.

Adversary 3 — Training data exfiltration via over-permissioned service account. A compromised CI/CD service account that has read access to the training data S3 bucket (over-provisioned for convenience) is used to exfiltrate the training dataset — potentially containing PII or proprietary business data.

Adversary 4 — Jupyter notebook as pivot. An internet-accessible Jupyter notebook (no auth, no TLS) is the entry point for an attacker who uses the notebook’s IAM role (which has access to the model registry and training data) to exfiltrate model weights and training data.

Without AI-specific posture management: these findings are invisible to existing tooling. With AI-specific posture management: automated checks surface each misconfiguration class; remediation is tracked against defined SLAs.

Configuration / Implementation

Step 1 — Inventory your AI infrastructure surface

The first step is knowing what exists:

#!/bin/bash
# ai-inventory.sh — discover AI/ML infrastructure

echo "=== Model Serving Endpoints ==="
# Find vLLM, Ollama, Triton, Ray Serve instances
kubectl get services --all-namespaces -o json | jq -r '
  .items[] |
  select(
    .metadata.labels["app"] // "" | test("vllm|ollama|triton|ray-serve|litellm") or
    .metadata.annotations["ai.component"] != null
  ) |
  "\(.metadata.namespace)/\(.metadata.name): port \(.spec.ports[0].port // "unknown")"
'

echo ""
echo "=== GPU Nodes ==="
kubectl get nodes -l accelerator=nvidia -o json | jq -r \
  '.items[] | "\(.metadata.name): \(.status.allocatable["nvidia.com/gpu"] // "0") GPUs"'

echo ""
echo "=== AI-Related Secrets (names only) ==="
kubectl get secrets --all-namespaces -o json | jq -r '
  .items[] |
  select(.metadata.name | test("(?i)huggingface|openai|anthropic|replicate|mlflow|wandb|comet")) |
  "\(.metadata.namespace)/\(.metadata.name)"
'

echo ""
echo "=== S3 Buckets with 'model' or 'training' in name ==="
aws s3api list-buckets --query \
  'Buckets[?contains(Name, `model`) || contains(Name, `training`)].Name' \
  --output text

echo ""
echo "=== EC2/EKS nodes with GPU instance types ==="
aws ec2 describe-instances \
  --filters "Name=instance-type,Values=p3.*,p4.*,g4.*,g5.*,inf1.*,trn1.*" \
  --query 'Reservations[*].Instances[*].[InstanceId,InstanceType,PublicIpAddress,Tags[?Key==`Name`].Value|[0]]' \
  --output table

Step 2 — Check model serving authentication

#!/bin/bash
# check-model-endpoints.sh — verify authentication on model serving endpoints

CLUSTER_ENDPOINTS=$(kubectl get services --all-namespaces -o json | jq -r '
  .items[] |
  select(.spec.type == "LoadBalancer" or .spec.type == "NodePort") |
  "\(.metadata.namespace)/\(.metadata.name):\(.spec.ports[0].port)"
')

check_endpoint_auth() {
  local namespace=$1
  local service=$2
  local port=$3
  
  # Get the service's ClusterIP
  local ip=$(kubectl get service "$service" -n "$namespace" \
    -o jsonpath='{.spec.clusterIP}')
  
  # Check common unauthenticated paths
  for path in "/v1/models" "/api/tags" "/health" "/"; do
    code=$(kubectl run auth-check-$RANDOM \
      --image=curlimages/curl:latest \
      --restart=Never \
      --rm \
      -it \
      -- curl -s -o /dev/null -w '%{http_code}' \
      "http://${ip}:${port}${path}" 2>/dev/null)
    
    if [[ "$code" == "200" ]]; then
      echo "FINDING: Unauthenticated access to ${namespace}/${service}${path} returns HTTP 200"
    fi
  done
}

# Run checks against all GPU-adjacent services
kubectl get services --all-namespaces -l "ai.component=serving" \
  -o jsonpath='{range .items[*]}{.metadata.namespace}{" "}{.metadata.name}{" "}{.spec.ports[0].port}{"\n"}{end}' | \
  while read ns svc port; do
    check_endpoint_auth "$ns" "$svc" "$port"
  done

Step 3 — Check model weight storage security

# check-model-storage.py — scan model weight buckets for security issues
import boto3
import json
from dataclasses import dataclass
from typing import list

@dataclass
class StorageFinding:
    severity: str
    bucket: str
    issue: str
    recommendation: str

def audit_model_buckets(bucket_name_patterns: list[str]) -> list[StorageFinding]:
    """Audit S3 buckets containing model weights for security issues."""
    s3 = boto3.client('s3')
    findings = []
    
    # List buckets matching AI/ML patterns
    buckets = s3.list_buckets()['Buckets']
    model_buckets = [
        b['Name'] for b in buckets
        if any(p.lower() in b['Name'].lower() for p in bucket_name_patterns)
    ]
    
    for bucket in model_buckets:
        # Check 1: Server-side encryption
        try:
            enc = s3.get_bucket_encryption(Bucket=bucket)
            rules = enc['ServerSideEncryptionConfiguration']['Rules']
            if not any(r.get('ApplyServerSideEncryptionByDefault', {}).get('SSEAlgorithm') == 'aws:kms'
                      for r in rules):
                findings.append(StorageFinding(
                    severity="HIGH",
                    bucket=bucket,
                    issue="Model weights encrypted with SSE-S3 (not KMS) — key rotation not enforced",
                    recommendation="Migrate to SSE-KMS with CMK; enable automatic key rotation"
                ))
        except s3.exceptions.ServerSideEncryptionConfigurationNotFoundError:
            findings.append(StorageFinding(
                severity="CRITICAL",
                bucket=bucket,
                issue="Model weights stored WITHOUT server-side encryption",
                recommendation="Enable SSE-KMS immediately"
            ))
        
        # Check 2: Bucket policy — check for overly broad principals
        try:
            policy = json.loads(s3.get_bucket_policy(Bucket=bucket)['Policy'])
            for stmt in policy.get('Statement', []):
                if stmt.get('Principal') == '*' or stmt.get('Principal', {}).get('AWS') == '*':
                    findings.append(StorageFinding(
                        severity="CRITICAL",
                        bucket=bucket,
                        issue="Model bucket is publicly accessible (Principal: *)",
                        recommendation="Remove public access; restrict to specific IAM roles"
                    ))
        except Exception:
            pass  # No bucket policy set
        
        # Check 3: Versioning (for model checkpoint integrity)
        versioning = s3.get_bucket_versioning(Bucket=bucket)
        if versioning.get('Status') != 'Enabled':
            findings.append(StorageFinding(
                severity="MEDIUM",
                bucket=bucket,
                issue="Versioning disabled — cannot detect unauthorised model weight modification",
                recommendation="Enable versioning with MFA delete for model weight buckets"
            ))
        
        # Check 4: Logging
        logging = s3.get_bucket_logging(Bucket=bucket)
        if 'LoggingEnabled' not in logging:
            findings.append(StorageFinding(
                severity="MEDIUM",
                bucket=bucket,
                issue="S3 access logging disabled — model weight access is unauditable",
                recommendation="Enable S3 access logging to dedicated audit log bucket"
            ))
    
    return findings

# Run audit
findings = audit_model_buckets(['model', 'checkpoint', 'weights', 'training', 'mlflow'])
for f in sorted(findings, key=lambda x: x.severity):
    print(f"[{f.severity}] {f.bucket}: {f.issue}")
    print(f"  → {f.recommendation}\n")

Step 4 — Jupyter notebook security scan

#!/bin/bash
# check-jupyter.sh — find exposed Jupyter instances

echo "=== Kubernetes Jupyter Services ==="
kubectl get services --all-namespaces -o json | jq -r '
  .items[] |
  select(
    .metadata.labels["app"] // "" | test("jupyter") or
    .metadata.name | test("jupyter")
  ) |
  "\(.metadata.namespace)/\(.metadata.name) type:\(.spec.type)"
'

echo ""
echo "=== Checking for Jupyter without authentication ==="
# Find Jupyter pods and check their startup args
kubectl get pods --all-namespaces -o json | jq -r '
  .items[] |
  select(.metadata.name | test("jupyter")) |
  .spec.containers[] |
  select(.command // [] | tostring | test("jupyter")) |
  if (.command | tostring | test("--no-browser|--NotebookApp.token=")) then
    "WARNING: \(env.NAMESPACE)/\(env.POD) — Jupyter started with reduced security"
  else
    "CHECK: Review Jupyter config for \(env.NAMESPACE)/\(env.POD)"
  end
' 

# Check for Jupyter env vars that disable auth
kubectl get pods --all-namespaces -o json | jq -r '
  .items[] |
  select(.metadata.name | test("jupyter")) |
  . as $pod |
  .spec.containers[].env[]? |
  select(.name == "JUPYTER_TOKEN" and (.value == "" or .value == null)) |
  "FINDING: \($pod.metadata.namespace)/\($pod.metadata.name) has empty JUPYTER_TOKEN"
'

Step 5 — Integrate into a posture scoring dashboard

# ai-posture-score.py — aggregate AI security posture findings into a score

from dataclasses import dataclass, field

@dataclass
class AIPostureScore:
    total_checks: int = 0
    passed: int = 0
    failed_critical: int = 0
    failed_high: int = 0
    failed_medium: int = 0
    findings: list = field(default_factory=list)
    
    @property
    def score(self) -> float:
        """0–100 posture score."""
        if self.total_checks == 0:
            return 0.0
        penalty = (self.failed_critical * 20 + 
                   self.failed_high * 10 + 
                   self.failed_medium * 3)
        return max(0.0, min(100.0, 100.0 - (penalty / self.total_checks * 100)))
    
    def report(self) -> str:
        grade = "A" if self.score >= 90 else \
                "B" if self.score >= 75 else \
                "C" if self.score >= 60 else \
                "D" if self.score >= 40 else "F"
        
        return f"""
AI Security Posture Report
==========================
Score: {self.score:.0f}/100 (Grade: {grade})
Checks: {self.total_checks} total | {self.passed} passed
Findings: {self.failed_critical} CRITICAL, {self.failed_high} HIGH, {self.failed_medium} MEDIUM

Top Issues:
{chr(10).join(f'  [{f.severity}] {f.issue}' for f in sorted(self.findings, key=lambda x: ["CRITICAL","HIGH","MEDIUM","LOW"].index(x.severity))[:5])}
"""

Step 6 — Remediation: secure MLflow deployment

# mlflow-secure-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mlflow-server
  namespace: ml-platform
spec:
  template:
    spec:
      serviceAccountName: mlflow-minimal  # Minimal RBAC — not cluster-admin
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      containers:
      - name: mlflow
        image: ghcr.io/mlflow/mlflow:2.13.0
        args:
        - server
        - --backend-store-uri=postgresql://mlflow:${DB_PASSWORD}@postgres:5432/mlflow
        - --default-artifact-root=s3://mlflow-artifacts-encrypted/
        - --host=0.0.0.0
        - --port=5000
        # Authentication via OAuth2 proxy sidecar (see below)
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop: ["ALL"]
        env:
        - name: MLFLOW_TRACKING_INSECURE_TLS
          value: "false"
      
      # OAuth2 proxy for authentication
      - name: oauth2-proxy
        image: quay.io/oauth2-proxy/oauth2-proxy:v7.6.0
        args:
        - --upstream=http://localhost:5000
        - --oidc-issuer-url=https://accounts.google.com
        - --client-id=$(OAUTH2_CLIENT_ID)
        - --client-secret=$(OAUTH2_CLIENT_SECRET)
        - --cookie-secure=true
        - --email-domain=example.com  # Restrict to company email domain
        ports:
        - containerPort: 4180

Expected Behaviour

Check	Without AI posture management	With AI posture management
Unauthenticated vLLM endpoint	Not detected by CSPM	Flagged as CRITICAL finding
Model weights without encryption	May be detected by CSPM if bucket-level check runs	Specifically checked for AI buckets + SSE-KMS enforcement
MLflow with shared admin credential	Not detected	MEDIUM finding: shared credentials + no MFA
Jupyter with no authentication	Not detected	CRITICAL finding; specific remediation provided
GPU node with broad host path mount	Not detected	HIGH finding in AI posture scan
Posture score visible in dashboard	Not available	Score 0–100 with trend; integrated into security metrics

Verification:

# Run full AI posture scan
python3 check-model-storage.py
bash check-model-endpoints.sh
bash check-jupyter.sh

# Expected: findings reported for each category
# Ideally: 0 CRITICAL findings; HIGH findings tracked with remediation tickets

Trade-offs

Aspect	Benefit	Cost	Mitigation
Per-endpoint auth checks	Finds unauthenticated model servers that CSPM misses	Requires cluster access to run checks; not passive	Run scans from a dedicated security scanner service account; schedule hourly
MLflow OAuth2 proxy	Adds authentication without modifying MLflow code	Adds a dependency; proxy must be kept updated	Use a managed OIDC provider; Renovate to keep proxy image current
Model bucket SSE-KMS enforcement	Protects model weights at rest	KMS key management overhead; cost of KMS API calls	Use AWS Key Management Service CMKs; cost is ~$1/key/month + API call cost
Jupyter network restriction	Limits Jupyter exposure to internal network only	Engineers working remotely need VPN or port-forward	Deploy Jupyter behind VPN; use `kubectl port-forward` for direct access

Failure Modes

Failure	Symptom	Detection	Recovery
Posture scan breaks on new AI framework deployment	Scanner finds unknown endpoint pattern; reports false clean	New framework deployed; scan doesn’t flag it	Maintain an “AI component” label convention; require label on all AI service deployments
MLflow OAuth2 proxy upgrade breaks authentication	Researchers cannot access MLflow; experiment logging fails	Application error logs; researcher reports	Pin proxy version; test upgrade in staging before production; rollback procedure in runbook
Model bucket encryption migration breaks training pipeline	Training job fails to write checkpoints after SSE-KMS migration	Job logs show S3 access denied; IAM missing kms:GenerateDataKey permission	Grant `kms:GenerateDataKey` and `kms:Decrypt` on the CMK to the training service account before migration
Posture score misleads — all checks pass but new attack surface added	Score shows 95/100; new unauthenticated service deployed and not yet in scan scope	Quarterly manual review finds gap	Run inventory check weekly; alert on new AI services without posture scan label

Cloud Security Posture Management — the CSPM foundation that AI posture management extends
AI Inference Cluster Attack Paths — the specific attack paths through AI inference infrastructure that posture management aims to close
Model Serving Hardening — hardening the model serving layer that posture management scans
Kubernetes AI Batch Job Isolation — isolating training workloads, one of the posture checks in the AI scan
MLOps Secrets Management — managing the secrets used throughout AI pipelines that posture management audits