AI Model Weight Security: Protecting Proprietary Parameters from Theft and Exfiltration

AI Model Weight Security: Protecting Proprietary Parameters from Theft and Exfiltration

Problem

A large language model trained on proprietary data represents a significant investment: months of GPU compute, curated datasets, and expert fine-tuning. The resulting model weights — the billions of floating-point parameters that encode the model’s capabilities — are the intellectual property that competitors cannot easily replicate. Stealing those weights is cheaper than training from scratch: copying 70 billion parameters from an S3 bucket takes minutes.

The attack surface for model weight theft has expanded significantly as inference infrastructure has scaled:

  • Storage exfiltration: Weights stored in S3/GCS/Azure Blob with overly permissive IAM policies. Any employee or system with S3 access can download multi-hundred-gigabyte model checkpoints.
  • Serving infrastructure access: Inference endpoints load weights into GPU memory. A compromised serving container can write the weights to a network socket.
  • Fine-tuning pipeline leakage: Fine-tuning jobs read base model weights. A malicious fine-tuning job can exfiltrate them to an external storage location.
  • Hugging Face Hub exposure: Weights accidentally pushed to a public Hugging Face repository or shared via a too-broad access token.
  • Insider exfiltration: An ML engineer with legitimate access copies weights to a personal storage account before leaving.

Unlike database records, model weights aren’t rows with PII fields — their theft is rarely immediately detectable. There is no “alert on model weight access” built into most ML platforms.

Target systems: AWS S3, GCS, Azure Blob for weight storage; Kubernetes for inference serving; Hugging Face Hub; SageMaker, Vertex AI, Azure ML for training/serving platforms; PyTorch 2.3+ for watermarking integration.

Threat Model

  • Adversary 1 — Overprivileged IAM exfiltration: An employee, CI pipeline, or compromised cloud credentials with S3 GetObject access to the model bucket downloads the full checkpoint and uploads it to an external bucket.
  • Adversary 2 — Serving container compromise: An attacker achieves code execution inside a model serving container. They read the weight tensors from GPU memory (via torch.save or direct GPU memory access) and exfiltrate via a network socket.
  • Adversary 3 — Fine-tuning pipeline abuse: A malicious fine-tuning job is submitted to the platform. The job reads the base model weights (to fine-tune) and also writes them to an external destination the job author controls.
  • Adversary 4 — Hugging Face token leak: An HF token with write access to a private repository is leaked. The attacker pushes the model to a public repository or a repository they control.
  • Adversary 5 — Insider theft before departure: An ML engineer with legitimate weight access copies the weights to personal cloud storage before their account is offboarded.
  • Access level: Adversary 1 has cloud IAM credentials. Adversary 2 has container-level execution. Adversary 3 has job submission access to the ML platform. Adversaries 4 and 5 have legitimate access that is being misused.
  • Objective: Obtain a copy of proprietary model weights for deployment, sale, or further study.
  • Blast radius: A stolen model can be deployed at near-zero marginal cost. Competitors gain months of training compute without the investment. For models trained on proprietary data, the stolen weights may also contain memorised sensitive training data.

Configuration

Step 1: S3 IAM Scoping for Model Weight Buckets

Create a dedicated bucket for model weights with strict IAM:

# Create dedicated model weights bucket (no existing cross-use).
aws s3api create-bucket \
  --bucket company-model-weights-prod \
  --region us-east-1

# Block all public access.
aws s3api put-public-access-block \
  --bucket company-model-weights-prod \
  --public-access-block-configuration \
    BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true

# Enable SSE-KMS with a model-specific KMS key (not the default SSE-S3 key).
aws s3api put-bucket-encryption \
  --bucket company-model-weights-prod \
  --server-side-encryption-configuration '{
    "Rules": [{
      "ApplyServerSideEncryptionByDefault": {
        "SSEAlgorithm": "aws:kms",
        "KMSMasterKeyID": "arn:aws:kms:us-east-1:ACCOUNT:key/MODEL-KMS-KEY-ID"
      },
      "BucketKeyEnabled": true
    }]
  }'

Minimal IAM policy for inference serving:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "InferenceReadOnly",
      "Effect": "Allow",
      "Action": ["s3:GetObject"],
      "Resource": "arn:aws:s3:::company-model-weights-prod/models/prod/*",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalTag/role": "inference-serving"
        },
        "IpAddress": {
          "aws:SourceIp": ["10.0.0.0/8"]   // Internal IPs only.
        }
      }
    }
  ]
}

No s3:GetObject on the entire bucket — scope to specific prefixes. No s3:ListBucket for inference roles — they should know the exact key. No s3:PutObject for serving roles — inference never writes weights back.

Training roles get write access scoped to specific checkpoint prefixes:

{
  "Action": ["s3:PutObject"],
  "Resource": [
    "arn:aws:s3:::company-model-weights-prod/checkpoints/job-${aws:PrincipalTag/job-id}/*"
  ]
}

Step 2: CloudTrail Alerts on Anomalous Weight Access

# CloudWatch metric filter: alert on bulk GetObject from model bucket.
aws cloudwatch put-metric-filter \
  --log-group-name CloudTrail/DefaultLogGroup \
  --filter-name ModelWeightBulkAccess \
  --filter-pattern '{ ($.eventName = "GetObject") && ($.requestParameters.bucketName = "company-model-weights-prod") }' \
  --metric-transformations \
    metricName=ModelWeightGetObject,metricNamespace=Security/MLOps,metricValue=1

# Alert if more than 10 GetObject calls from the model bucket in 5 minutes.
# 10 calls × average model shard size = multi-GB download threshold.
aws cloudwatch put-metric-alarm \
  --alarm-name ModelWeightBulkDownload \
  --metric-name ModelWeightGetObject \
  --namespace Security/MLOps \
  --statistic Sum \
  --period 300 \
  --threshold 10 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:ACCOUNT:security-alerts

Alert on access from unexpected IAM principals:

# Lambda function to alert on unexpected model weight access.
import boto3, json

ALLOWED_ROLES = {
    "arn:aws:iam::ACCOUNT:role/inference-serving",
    "arn:aws:iam::ACCOUNT:role/training-job",
    "arn:aws:iam::ACCOUNT:role/mlops-admin",
}

def handler(event, context):
    for record in event["Records"]:
        log_data = json.loads(record["Sns"]["Message"])
        for trail_event in log_data.get("Records", []):
            if (trail_event.get("eventName") == "GetObject" and
                "model-weights" in trail_event.get("requestParameters", {}).get("bucketName", "")):

                principal = trail_event.get("userIdentity", {}).get("arn", "")
                if not any(allowed in principal for allowed in ALLOWED_ROLES):
                    send_security_alert({
                        "event": "model_weight_unexpected_access",
                        "principal": principal,
                        "bucket": trail_event["requestParameters"]["bucketName"],
                        "key": trail_event["requestParameters"].get("key"),
                        "source_ip": trail_event.get("sourceIPAddress"),
                        "time": trail_event["eventTime"],
                    })

Step 3: Serving Container Network Egress Restriction

Inference serving containers should not be able to exfiltrate weights via the network:

# NetworkPolicy: inference pods can only reach internal APIs, not external IPs.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: inference-egress-restriction
  namespace: ml-serving
spec:
  podSelector:
    matchLabels:
      role: inference-serving
  policyTypes: [Egress]
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: internal-api
      ports:
        - port: 443
    - to:
        - ipBlock:
            cidr: 10.0.0.0/8   # Internal only.
    # Explicitly no egress to 0.0.0.0/0.

Apply seccomp to inference pods to restrict syscalls that could exfiltrate data:

securityContext:
  seccompProfile:
    type: RuntimeDefault
  allowPrivilegeEscalation: false
  capabilities:
    drop: [ALL]

Step 4: Model Weight Watermarking

Watermarking embeds a hidden signal in model weights. If stolen weights are deployed by a competitor, the watermark allows you to verify the model is derived from your weights.

Parameter-based watermarking (simple):

import torch
import hashlib

def embed_watermark(model: torch.nn.Module, watermark_key: str, layer_name: str = "lm_head.weight") -> torch.nn.Module:
    """
    Embed a watermark by slightly perturbing specific weight positions.
    The perturbation is determined by the watermark key — the pattern is
    reproducible but imperceptibly small.
    """
    layer = dict(model.named_parameters())[layer_name]

    # Derive perturbation positions from the watermark key.
    key_hash = hashlib.sha256(watermark_key.encode()).digest()
    indices = [key_hash[i] % layer.shape[0] for i in range(32)]

    # Apply a tiny perturbation at the selected positions.
    with torch.no_grad():
        perturbation = torch.zeros_like(layer)
        for idx in indices:
            perturbation[idx, 0] = 1e-6  # Imperceptibly small; doesn't affect model quality.
        layer.add_(perturbation)

    return model

def verify_watermark(model: torch.nn.Module, watermark_key: str, layer_name: str = "lm_head.weight") -> bool:
    """Check if the expected watermark is present in the model weights."""
    layer = dict(model.named_parameters())[layer_name]

    key_hash = hashlib.sha256(watermark_key.encode()).digest()
    indices = [key_hash[i] % layer.shape[0] for i in range(32)]

    # Check if the perturbation is present (within floating-point tolerance).
    for idx in indices:
        if abs(layer[idx, 0].item()) < 5e-7:
            return False

    return True

# Usage:
model = load_base_model()
watermarked_model = embed_watermark(model, watermark_key="company-model-v2-2026Q1")
torch.save(watermarked_model.state_dict(), "model-weights-watermarked.pt")

# Verification (on a suspected stolen model).
suspected_model = load_model("suspected-model.pt")
if verify_watermark(suspected_model, watermark_key="company-model-v2-2026Q1"):
    print("Watermark confirmed: this model derives from our weights.")

More robust approaches use radiometric watermarking (training-time embedding) or steganographic watermarks in the weight distribution. Research-grade systems like radioactive-data and REEF provide stronger statistical guarantees.

Step 5: Hugging Face Hub Access Control

# Verify all private model repositories have access restrictions.
huggingface-cli whoami
pip install huggingface_hub

python3 << 'EOF'
from huggingface_hub import HfApi

api = HfApi()
models = api.list_models(author="company-org")
for model in models:
    info = api.model_info(model.id)
    if info.private == False:
        print(f"WARNING: {model.id} is PUBLIC")
    else:
        # Check collaborators.
        collaborators = api.list_repo_collaborators(model.id)
        for collab in collaborators:
            print(f"  {model.id}: collaborator {collab.username} ({collab.role})")
EOF

# Rotate HF tokens quarterly and on employee departure.
# List all tokens.
# (No CLI for token listing; use the HF web UI or API with user admin token.)

# Create a scoped token with minimal permissions for inference (read-only).
# In HF UI: Settings → Access Tokens → New Token
# Type: Fine-grained → Repository: read (specific repos only)

Step 6: Offboarding Controls for ML Engineers

# On ML engineer departure:

# 1. Revoke AWS IAM access immediately.
aws iam list-user-policies --user-name departing-engineer
aws iam delete-user --user-name departing-engineer

# 2. Rotate any Hugging Face tokens the engineer may have created.
# (Requires HF org admin access; revoke all tokens associated with the user.)

# 3. Audit recent S3 access from the engineer's IAM user.
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=Username,AttributeValue=departing-engineer \
  --start-time 2026-04-01T00:00:00Z \
  | jq '.Events[] | select(.Resources[].ResourceType == "AWS::S3::Object") |
    {time: .EventTime, bucket: .Resources[0].ResourceName}'

# 4. Check for any unusual data transfer from the engineer's recent sessions.
# Look for large S3 downloads in CloudTrail: GetObject calls on model bucket in last 30 days.

Step 7: Telemetry

model_weight_access_total{principal, model, operation}       counter
model_weight_bulk_download_total{principal, byte_count}      counter
model_weight_unexpected_principal_total{principal}           counter
model_weight_external_network_egress_blocked{pod}            counter
hf_token_usage_total{token_scope, operation}                 counter
model_watermark_verification_result{model, result}           gauge

Alert on:

  • model_weight_bulk_download_total — any download of > 1GB of model weights is worth auditing.
  • model_weight_unexpected_principal_total — a non-allowlisted IAM role accessed model weights.
  • model_weight_external_network_egress_blocked — serving container attempted external connection.
  • hf_token_usage_total with an unexpected token_scope — a broad token used where a narrow one should be.

Expected Behaviour

Signal Unprotected model weights Hardened model weights
Overprivileged IAM access Any S3 user can download Scoped to specific prefixes and roles; IP-restricted
Bulk download anomaly Silent CloudTrail metric alert after N GetObject calls
Serving container exfiltration Container can connect to any external IP NetworkPolicy blocks all external egress
Stolen weights deployed by competitor No evidence of derivation Watermark verification proves derivation
Hugging Face public exposure Models discoverable Org-wide private; no public repositories

Trade-offs

Aspect Benefit Cost Mitigation
KMS-encrypted weights Second encryption layer beyond SSE-S3 KMS call latency on every read Use bucket keys (one KMS call per prefix per hour); acceptable for large weights loaded once.
Scoped IAM per job Precise access control More IAM roles to manage Template roles; job-id tagging provides dynamic scoping without one role per job.
Parameter watermarking Post-theft evidence Imperceptible watermarks may be stripped with fine-tuning Use training-time watermarking (radioactive-data) for stronger robustness.
NetworkPolicy on inference pods Blocks serving-container exfiltration Breaks serving pods that reach external APIs Allowlist only the specific external services the model needs for inference (e.g., a tokenizer API).
Offboarding access revocation Immediate denial post-departure CloudTrail retrospective shows access before revocation Alert on large downloads in the 30 days before departure; audit on offboarding.

Failure Modes

Failure Symptom Detection Recovery
IAM role too broad (prefix glob) Inference role reads checkpoints outside its scope CloudTrail shows access to unexpected prefixes Tighten IAM resource ARN to exact model prefix.
CloudTrail alert threshold too low False positives from normal inference weight loading Frequent alerts for legitimate access Tune threshold to actual weight shard count; typical inference loads 1 checkpoint = N shards.
Watermark stripped by fine-tuning Verification fails on derivative model Verification returns false negative Use more robust watermarking scheme; combine parameter watermark with training-time approach.
HF token leaked via CI log Model repository exposed to anyone with token HF audit log shows unexpected clones Rotate token immediately; check if model was cloned by unexpected users; evaluate re-upload under new private repo.
Model weight download in egress Large file exfiltration via HTTPS Egress anomaly detection; S3 GetObject alert VPC endpoint for S3 (all S3 traffic stays within AWS); firewall blocks direct internet egress.