AI Security Posture Management: Extending CSPM to ML Infrastructure
Problem
Cloud Security Posture Management (CSPM) tools — AWS Security Hub, GCP Security Command Centre, Prisma Cloud, Wiz — have matured into comprehensive scanners for traditional cloud infrastructure. They check S3 bucket policies, IAM permission breadth, unencrypted databases, and exposed security group rules. For most cloud workloads, a CSPM tool provides reasonable baseline coverage.
AI and ML infrastructure introduces a layer of attack surface that existing CSPM tools are almost entirely blind to. The tools were not designed for it, the check libraries don’t include it, and the ML engineering teams who deploy this infrastructure are rarely thinking about it as a security surface. The result is AI infrastructure that passes a CSPM scan cleanly while hosting several critical misconfigurations.
The specific AI/ML attack surface that standard CSPM misses:
Unauthenticated model serving endpoints. Frameworks like vLLM, Ollama, Ray Serve, and NVIDIA Triton start with no authentication by default. Engineers who deploy these for internal use frequently leave them accessible without credentials. Unlike a database, these are not flagged by CSPM as “unauthenticated” because CSPM tools don’t understand the model serving API. The exposed endpoint allows anyone who can reach it to extract the model, enumerate its capabilities, run inference at your cost, and potentially extract training data via membership inference.
Unencrypted model weight storage. Model checkpoints and fine-tuned weights stored in S3, GCS, or Azure Blob are frequently stored without server-side encryption, or with encryption that is accessible to overly broad IAM roles. Model weights represent significant IP value — potentially months of compute investment — and their exfiltration may be unnoticed for extended periods.
Over-permissioned MLflow and experiment tracking. MLflow, Weights & Biases, and similar experiment tracking tools are often deployed with admin credentials shared across the team, no audit logging, and access to production model versions. Compromise of the experiment tracking service gives an attacker access to all model artifacts and the ability to push a malicious model version.
GPU node host path mounts. AI training jobs frequently require access to GPU drivers and device files via host path mounts. When these mounts are too broad, they expose the host filesystem to the container, enabling privilege escalation from the training job.
Jupyter notebook servers with no authentication. Jupyter notebooks remain the most common entry point for ML engineers. Default Jupyter deployments have no authentication, no TLS, and run as the notebook server’s user (often with broad IAM permissions for accessing training data and model registries).
Model serving without rate limiting. Production inference endpoints without rate limiting are vulnerable to cost-based denial of service: an attacker who discovers the endpoint can exhaust your GPU compute budget or run inference requests to extract model capabilities.
Training data buckets with overly broad access policies. Training datasets, especially those containing PII, are frequently stored in buckets that are accessible to any service account in the data science team rather than scoped to specific training jobs.
None of these misconfigurations appear in standard CSPM output. Building AI security posture management requires either extending your existing CSPM tool with custom checks, deploying AI-specific scanning tooling, or writing your own checks against your inventory.
Target systems: any cloud environment running ML training or inference workloads; Kubernetes clusters with GPU nodes and AI frameworks deployed; teams using MLflow, W&B, or similar experiment tracking; organisations where ML engineers deploy infrastructure independently of a platform team.
Threat Model
Adversary 1 — External discovery of unauthenticated model endpoint. Internet or internal scanner discovers an unauthenticated Ollama or vLLM endpoint. Attacker runs inference at no cost, extracts model capabilities, and attempts model inversion to recover training data. Cost: your GPU bill spikes without alert.
Adversary 2 — MLflow model substitution. Attacker with access to an over-permissioned MLflow server (shared admin credential, no MFA) promotes a poisoned model version to production. The production model serving pipeline pulls the new version on next restart. Inference outputs are now attacker-influenced.
Adversary 3 — Training data exfiltration via over-permissioned service account. A compromised CI/CD service account that has read access to the training data S3 bucket (over-provisioned for convenience) is used to exfiltrate the training dataset — potentially containing PII or proprietary business data.
Adversary 4 — Jupyter notebook as pivot. An internet-accessible Jupyter notebook (no auth, no TLS) is the entry point for an attacker who uses the notebook’s IAM role (which has access to the model registry and training data) to exfiltrate model weights and training data.
Without AI-specific posture management: these findings are invisible to existing tooling. With AI-specific posture management: automated checks surface each misconfiguration class; remediation is tracked against defined SLAs.
Configuration / Implementation
Step 1 — Inventory your AI infrastructure surface
The first step is knowing what exists:
#!/bin/bash
# ai-inventory.sh — discover AI/ML infrastructure
echo "=== Model Serving Endpoints ==="
# Find vLLM, Ollama, Triton, Ray Serve instances
kubectl get services --all-namespaces -o json | jq -r '
.items[] |
select(
.metadata.labels["app"] // "" | test("vllm|ollama|triton|ray-serve|litellm") or
.metadata.annotations["ai.component"] != null
) |
"\(.metadata.namespace)/\(.metadata.name): port \(.spec.ports[0].port // "unknown")"
'
echo ""
echo "=== GPU Nodes ==="
kubectl get nodes -l accelerator=nvidia -o json | jq -r \
'.items[] | "\(.metadata.name): \(.status.allocatable["nvidia.com/gpu"] // "0") GPUs"'
echo ""
echo "=== AI-Related Secrets (names only) ==="
kubectl get secrets --all-namespaces -o json | jq -r '
.items[] |
select(.metadata.name | test("(?i)huggingface|openai|anthropic|replicate|mlflow|wandb|comet")) |
"\(.metadata.namespace)/\(.metadata.name)"
'
echo ""
echo "=== S3 Buckets with 'model' or 'training' in name ==="
aws s3api list-buckets --query \
'Buckets[?contains(Name, `model`) || contains(Name, `training`)].Name' \
--output text
echo ""
echo "=== EC2/EKS nodes with GPU instance types ==="
aws ec2 describe-instances \
--filters "Name=instance-type,Values=p3.*,p4.*,g4.*,g5.*,inf1.*,trn1.*" \
--query 'Reservations[*].Instances[*].[InstanceId,InstanceType,PublicIpAddress,Tags[?Key==`Name`].Value|[0]]' \
--output table
Step 2 — Check model serving authentication
#!/bin/bash
# check-model-endpoints.sh — verify authentication on model serving endpoints
CLUSTER_ENDPOINTS=$(kubectl get services --all-namespaces -o json | jq -r '
.items[] |
select(.spec.type == "LoadBalancer" or .spec.type == "NodePort") |
"\(.metadata.namespace)/\(.metadata.name):\(.spec.ports[0].port)"
')
check_endpoint_auth() {
local namespace=$1
local service=$2
local port=$3
# Get the service's ClusterIP
local ip=$(kubectl get service "$service" -n "$namespace" \
-o jsonpath='{.spec.clusterIP}')
# Check common unauthenticated paths
for path in "/v1/models" "/api/tags" "/health" "/"; do
code=$(kubectl run auth-check-$RANDOM \
--image=curlimages/curl:latest \
--restart=Never \
--rm \
-it \
-- curl -s -o /dev/null -w '%{http_code}' \
"http://${ip}:${port}${path}" 2>/dev/null)
if [[ "$code" == "200" ]]; then
echo "FINDING: Unauthenticated access to ${namespace}/${service}${path} returns HTTP 200"
fi
done
}
# Run checks against all GPU-adjacent services
kubectl get services --all-namespaces -l "ai.component=serving" \
-o jsonpath='{range .items[*]}{.metadata.namespace}{" "}{.metadata.name}{" "}{.spec.ports[0].port}{"\n"}{end}' | \
while read ns svc port; do
check_endpoint_auth "$ns" "$svc" "$port"
done
Step 3 — Check model weight storage security
# check-model-storage.py — scan model weight buckets for security issues
import boto3
import json
from dataclasses import dataclass
from typing import list
@dataclass
class StorageFinding:
severity: str
bucket: str
issue: str
recommendation: str
def audit_model_buckets(bucket_name_patterns: list[str]) -> list[StorageFinding]:
"""Audit S3 buckets containing model weights for security issues."""
s3 = boto3.client('s3')
findings = []
# List buckets matching AI/ML patterns
buckets = s3.list_buckets()['Buckets']
model_buckets = [
b['Name'] for b in buckets
if any(p.lower() in b['Name'].lower() for p in bucket_name_patterns)
]
for bucket in model_buckets:
# Check 1: Server-side encryption
try:
enc = s3.get_bucket_encryption(Bucket=bucket)
rules = enc['ServerSideEncryptionConfiguration']['Rules']
if not any(r.get('ApplyServerSideEncryptionByDefault', {}).get('SSEAlgorithm') == 'aws:kms'
for r in rules):
findings.append(StorageFinding(
severity="HIGH",
bucket=bucket,
issue="Model weights encrypted with SSE-S3 (not KMS) — key rotation not enforced",
recommendation="Migrate to SSE-KMS with CMK; enable automatic key rotation"
))
except s3.exceptions.ServerSideEncryptionConfigurationNotFoundError:
findings.append(StorageFinding(
severity="CRITICAL",
bucket=bucket,
issue="Model weights stored WITHOUT server-side encryption",
recommendation="Enable SSE-KMS immediately"
))
# Check 2: Bucket policy — check for overly broad principals
try:
policy = json.loads(s3.get_bucket_policy(Bucket=bucket)['Policy'])
for stmt in policy.get('Statement', []):
if stmt.get('Principal') == '*' or stmt.get('Principal', {}).get('AWS') == '*':
findings.append(StorageFinding(
severity="CRITICAL",
bucket=bucket,
issue="Model bucket is publicly accessible (Principal: *)",
recommendation="Remove public access; restrict to specific IAM roles"
))
except Exception:
pass # No bucket policy set
# Check 3: Versioning (for model checkpoint integrity)
versioning = s3.get_bucket_versioning(Bucket=bucket)
if versioning.get('Status') != 'Enabled':
findings.append(StorageFinding(
severity="MEDIUM",
bucket=bucket,
issue="Versioning disabled — cannot detect unauthorised model weight modification",
recommendation="Enable versioning with MFA delete for model weight buckets"
))
# Check 4: Logging
logging = s3.get_bucket_logging(Bucket=bucket)
if 'LoggingEnabled' not in logging:
findings.append(StorageFinding(
severity="MEDIUM",
bucket=bucket,
issue="S3 access logging disabled — model weight access is unauditable",
recommendation="Enable S3 access logging to dedicated audit log bucket"
))
return findings
# Run audit
findings = audit_model_buckets(['model', 'checkpoint', 'weights', 'training', 'mlflow'])
for f in sorted(findings, key=lambda x: x.severity):
print(f"[{f.severity}] {f.bucket}: {f.issue}")
print(f" → {f.recommendation}\n")
Step 4 — Jupyter notebook security scan
#!/bin/bash
# check-jupyter.sh — find exposed Jupyter instances
echo "=== Kubernetes Jupyter Services ==="
kubectl get services --all-namespaces -o json | jq -r '
.items[] |
select(
.metadata.labels["app"] // "" | test("jupyter") or
.metadata.name | test("jupyter")
) |
"\(.metadata.namespace)/\(.metadata.name) type:\(.spec.type)"
'
echo ""
echo "=== Checking for Jupyter without authentication ==="
# Find Jupyter pods and check their startup args
kubectl get pods --all-namespaces -o json | jq -r '
.items[] |
select(.metadata.name | test("jupyter")) |
.spec.containers[] |
select(.command // [] | tostring | test("jupyter")) |
if (.command | tostring | test("--no-browser|--NotebookApp.token=")) then
"WARNING: \(env.NAMESPACE)/\(env.POD) — Jupyter started with reduced security"
else
"CHECK: Review Jupyter config for \(env.NAMESPACE)/\(env.POD)"
end
'
# Check for Jupyter env vars that disable auth
kubectl get pods --all-namespaces -o json | jq -r '
.items[] |
select(.metadata.name | test("jupyter")) |
. as $pod |
.spec.containers[].env[]? |
select(.name == "JUPYTER_TOKEN" and (.value == "" or .value == null)) |
"FINDING: \($pod.metadata.namespace)/\($pod.metadata.name) has empty JUPYTER_TOKEN"
'
Step 5 — Integrate into a posture scoring dashboard
# ai-posture-score.py — aggregate AI security posture findings into a score
from dataclasses import dataclass, field
@dataclass
class AIPostureScore:
total_checks: int = 0
passed: int = 0
failed_critical: int = 0
failed_high: int = 0
failed_medium: int = 0
findings: list = field(default_factory=list)
@property
def score(self) -> float:
"""0–100 posture score."""
if self.total_checks == 0:
return 0.0
penalty = (self.failed_critical * 20 +
self.failed_high * 10 +
self.failed_medium * 3)
return max(0.0, min(100.0, 100.0 - (penalty / self.total_checks * 100)))
def report(self) -> str:
grade = "A" if self.score >= 90 else \
"B" if self.score >= 75 else \
"C" if self.score >= 60 else \
"D" if self.score >= 40 else "F"
return f"""
AI Security Posture Report
==========================
Score: {self.score:.0f}/100 (Grade: {grade})
Checks: {self.total_checks} total | {self.passed} passed
Findings: {self.failed_critical} CRITICAL, {self.failed_high} HIGH, {self.failed_medium} MEDIUM
Top Issues:
{chr(10).join(f' [{f.severity}] {f.issue}' for f in sorted(self.findings, key=lambda x: ["CRITICAL","HIGH","MEDIUM","LOW"].index(x.severity))[:5])}
"""
Step 6 — Remediation: secure MLflow deployment
# mlflow-secure-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mlflow-server
namespace: ml-platform
spec:
template:
spec:
serviceAccountName: mlflow-minimal # Minimal RBAC — not cluster-admin
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: mlflow
image: ghcr.io/mlflow/mlflow:2.13.0
args:
- server
- --backend-store-uri=postgresql://mlflow:${DB_PASSWORD}@postgres:5432/mlflow
- --default-artifact-root=s3://mlflow-artifacts-encrypted/
- --host=0.0.0.0
- --port=5000
# Authentication via OAuth2 proxy sidecar (see below)
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
env:
- name: MLFLOW_TRACKING_INSECURE_TLS
value: "false"
# OAuth2 proxy for authentication
- name: oauth2-proxy
image: quay.io/oauth2-proxy/oauth2-proxy:v7.6.0
args:
- --upstream=http://localhost:5000
- --oidc-issuer-url=https://accounts.google.com
- --client-id=$(OAUTH2_CLIENT_ID)
- --client-secret=$(OAUTH2_CLIENT_SECRET)
- --cookie-secure=true
- --email-domain=example.com # Restrict to company email domain
ports:
- containerPort: 4180
Expected Behaviour
| Check | Without AI posture management | With AI posture management |
|---|---|---|
| Unauthenticated vLLM endpoint | Not detected by CSPM | Flagged as CRITICAL finding |
| Model weights without encryption | May be detected by CSPM if bucket-level check runs | Specifically checked for AI buckets + SSE-KMS enforcement |
| MLflow with shared admin credential | Not detected | MEDIUM finding: shared credentials + no MFA |
| Jupyter with no authentication | Not detected | CRITICAL finding; specific remediation provided |
| GPU node with broad host path mount | Not detected | HIGH finding in AI posture scan |
| Posture score visible in dashboard | Not available | Score 0–100 with trend; integrated into security metrics |
Verification:
# Run full AI posture scan
python3 check-model-storage.py
bash check-model-endpoints.sh
bash check-jupyter.sh
# Expected: findings reported for each category
# Ideally: 0 CRITICAL findings; HIGH findings tracked with remediation tickets
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Per-endpoint auth checks | Finds unauthenticated model servers that CSPM misses | Requires cluster access to run checks; not passive | Run scans from a dedicated security scanner service account; schedule hourly |
| MLflow OAuth2 proxy | Adds authentication without modifying MLflow code | Adds a dependency; proxy must be kept updated | Use a managed OIDC provider; Renovate to keep proxy image current |
| Model bucket SSE-KMS enforcement | Protects model weights at rest | KMS key management overhead; cost of KMS API calls | Use AWS Key Management Service CMKs; cost is ~$1/key/month + API call cost |
| Jupyter network restriction | Limits Jupyter exposure to internal network only | Engineers working remotely need VPN or port-forward | Deploy Jupyter behind VPN; use kubectl port-forward for direct access |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Posture scan breaks on new AI framework deployment | Scanner finds unknown endpoint pattern; reports false clean | New framework deployed; scan doesn’t flag it | Maintain an “AI component” label convention; require label on all AI service deployments |
| MLflow OAuth2 proxy upgrade breaks authentication | Researchers cannot access MLflow; experiment logging fails | Application error logs; researcher reports | Pin proxy version; test upgrade in staging before production; rollback procedure in runbook |
| Model bucket encryption migration breaks training pipeline | Training job fails to write checkpoints after SSE-KMS migration | Job logs show S3 access denied; IAM missing kms:GenerateDataKey permission | Grant kms:GenerateDataKey and kms:Decrypt on the CMK to the training service account before migration |
| Posture score misleads — all checks pass but new attack surface added | Score shows 95/100; new unauthenticated service deployed and not yet in scan scope | Quarterly manual review finds gap | Run inventory check weekly; alert on new AI services without posture scan label |
Related Articles
- Cloud Security Posture Management — the CSPM foundation that AI posture management extends
- AI Inference Cluster Attack Paths — the specific attack paths through AI inference infrastructure that posture management aims to close
- Model Serving Hardening — hardening the model serving layer that posture management scans
- Kubernetes AI Batch Job Isolation — isolating training workloads, one of the posture checks in the AI scan
- MLOps Secrets Management — managing the secrets used throughout AI pipelines that posture management audits