AI Model Weight Security: Protecting Proprietary Parameters from Theft and Exfiltration
Problem
A large language model trained on proprietary data represents a significant investment: months of GPU compute, curated datasets, and expert fine-tuning. The resulting model weights — the billions of floating-point parameters that encode the model’s capabilities — are the intellectual property that competitors cannot easily replicate. Stealing those weights is cheaper than training from scratch: copying 70 billion parameters from an S3 bucket takes minutes.
The attack surface for model weight theft has expanded significantly as inference infrastructure has scaled:
- Storage exfiltration: Weights stored in S3/GCS/Azure Blob with overly permissive IAM policies. Any employee or system with S3 access can download multi-hundred-gigabyte model checkpoints.
- Serving infrastructure access: Inference endpoints load weights into GPU memory. A compromised serving container can write the weights to a network socket.
- Fine-tuning pipeline leakage: Fine-tuning jobs read base model weights. A malicious fine-tuning job can exfiltrate them to an external storage location.
- Hugging Face Hub exposure: Weights accidentally pushed to a public Hugging Face repository or shared via a too-broad access token.
- Insider exfiltration: An ML engineer with legitimate access copies weights to a personal storage account before leaving.
Unlike database records, model weights aren’t rows with PII fields — their theft is rarely immediately detectable. There is no “alert on model weight access” built into most ML platforms.
Target systems: AWS S3, GCS, Azure Blob for weight storage; Kubernetes for inference serving; Hugging Face Hub; SageMaker, Vertex AI, Azure ML for training/serving platforms; PyTorch 2.3+ for watermarking integration.
Threat Model
- Adversary 1 — Overprivileged IAM exfiltration: An employee, CI pipeline, or compromised cloud credentials with S3
GetObjectaccess to the model bucket downloads the full checkpoint and uploads it to an external bucket. - Adversary 2 — Serving container compromise: An attacker achieves code execution inside a model serving container. They read the weight tensors from GPU memory (via
torch.saveor direct GPU memory access) and exfiltrate via a network socket. - Adversary 3 — Fine-tuning pipeline abuse: A malicious fine-tuning job is submitted to the platform. The job reads the base model weights (to fine-tune) and also writes them to an external destination the job author controls.
- Adversary 4 — Hugging Face token leak: An HF token with write access to a private repository is leaked. The attacker pushes the model to a public repository or a repository they control.
- Adversary 5 — Insider theft before departure: An ML engineer with legitimate weight access copies the weights to personal cloud storage before their account is offboarded.
- Access level: Adversary 1 has cloud IAM credentials. Adversary 2 has container-level execution. Adversary 3 has job submission access to the ML platform. Adversaries 4 and 5 have legitimate access that is being misused.
- Objective: Obtain a copy of proprietary model weights for deployment, sale, or further study.
- Blast radius: A stolen model can be deployed at near-zero marginal cost. Competitors gain months of training compute without the investment. For models trained on proprietary data, the stolen weights may also contain memorised sensitive training data.
Configuration
Step 1: S3 IAM Scoping for Model Weight Buckets
Create a dedicated bucket for model weights with strict IAM:
# Create dedicated model weights bucket (no existing cross-use).
aws s3api create-bucket \
--bucket company-model-weights-prod \
--region us-east-1
# Block all public access.
aws s3api put-public-access-block \
--bucket company-model-weights-prod \
--public-access-block-configuration \
BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
# Enable SSE-KMS with a model-specific KMS key (not the default SSE-S3 key).
aws s3api put-bucket-encryption \
--bucket company-model-weights-prod \
--server-side-encryption-configuration '{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "aws:kms",
"KMSMasterKeyID": "arn:aws:kms:us-east-1:ACCOUNT:key/MODEL-KMS-KEY-ID"
},
"BucketKeyEnabled": true
}]
}'
Minimal IAM policy for inference serving:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "InferenceReadOnly",
"Effect": "Allow",
"Action": ["s3:GetObject"],
"Resource": "arn:aws:s3:::company-model-weights-prod/models/prod/*",
"Condition": {
"StringEquals": {
"aws:PrincipalTag/role": "inference-serving"
},
"IpAddress": {
"aws:SourceIp": ["10.0.0.0/8"] // Internal IPs only.
}
}
}
]
}
No s3:GetObject on the entire bucket — scope to specific prefixes. No s3:ListBucket for inference roles — they should know the exact key. No s3:PutObject for serving roles — inference never writes weights back.
Training roles get write access scoped to specific checkpoint prefixes:
{
"Action": ["s3:PutObject"],
"Resource": [
"arn:aws:s3:::company-model-weights-prod/checkpoints/job-${aws:PrincipalTag/job-id}/*"
]
}
Step 2: CloudTrail Alerts on Anomalous Weight Access
# CloudWatch metric filter: alert on bulk GetObject from model bucket.
aws cloudwatch put-metric-filter \
--log-group-name CloudTrail/DefaultLogGroup \
--filter-name ModelWeightBulkAccess \
--filter-pattern '{ ($.eventName = "GetObject") && ($.requestParameters.bucketName = "company-model-weights-prod") }' \
--metric-transformations \
metricName=ModelWeightGetObject,metricNamespace=Security/MLOps,metricValue=1
# Alert if more than 10 GetObject calls from the model bucket in 5 minutes.
# 10 calls × average model shard size = multi-GB download threshold.
aws cloudwatch put-metric-alarm \
--alarm-name ModelWeightBulkDownload \
--metric-name ModelWeightGetObject \
--namespace Security/MLOps \
--statistic Sum \
--period 300 \
--threshold 10 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:ACCOUNT:security-alerts
Alert on access from unexpected IAM principals:
# Lambda function to alert on unexpected model weight access.
import boto3, json
ALLOWED_ROLES = {
"arn:aws:iam::ACCOUNT:role/inference-serving",
"arn:aws:iam::ACCOUNT:role/training-job",
"arn:aws:iam::ACCOUNT:role/mlops-admin",
}
def handler(event, context):
for record in event["Records"]:
log_data = json.loads(record["Sns"]["Message"])
for trail_event in log_data.get("Records", []):
if (trail_event.get("eventName") == "GetObject" and
"model-weights" in trail_event.get("requestParameters", {}).get("bucketName", "")):
principal = trail_event.get("userIdentity", {}).get("arn", "")
if not any(allowed in principal for allowed in ALLOWED_ROLES):
send_security_alert({
"event": "model_weight_unexpected_access",
"principal": principal,
"bucket": trail_event["requestParameters"]["bucketName"],
"key": trail_event["requestParameters"].get("key"),
"source_ip": trail_event.get("sourceIPAddress"),
"time": trail_event["eventTime"],
})
Step 3: Serving Container Network Egress Restriction
Inference serving containers should not be able to exfiltrate weights via the network:
# NetworkPolicy: inference pods can only reach internal APIs, not external IPs.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: inference-egress-restriction
namespace: ml-serving
spec:
podSelector:
matchLabels:
role: inference-serving
policyTypes: [Egress]
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: internal-api
ports:
- port: 443
- to:
- ipBlock:
cidr: 10.0.0.0/8 # Internal only.
# Explicitly no egress to 0.0.0.0/0.
Apply seccomp to inference pods to restrict syscalls that could exfiltrate data:
securityContext:
seccompProfile:
type: RuntimeDefault
allowPrivilegeEscalation: false
capabilities:
drop: [ALL]
Step 4: Model Weight Watermarking
Watermarking embeds a hidden signal in model weights. If stolen weights are deployed by a competitor, the watermark allows you to verify the model is derived from your weights.
Parameter-based watermarking (simple):
import torch
import hashlib
def embed_watermark(model: torch.nn.Module, watermark_key: str, layer_name: str = "lm_head.weight") -> torch.nn.Module:
"""
Embed a watermark by slightly perturbing specific weight positions.
The perturbation is determined by the watermark key — the pattern is
reproducible but imperceptibly small.
"""
layer = dict(model.named_parameters())[layer_name]
# Derive perturbation positions from the watermark key.
key_hash = hashlib.sha256(watermark_key.encode()).digest()
indices = [key_hash[i] % layer.shape[0] for i in range(32)]
# Apply a tiny perturbation at the selected positions.
with torch.no_grad():
perturbation = torch.zeros_like(layer)
for idx in indices:
perturbation[idx, 0] = 1e-6 # Imperceptibly small; doesn't affect model quality.
layer.add_(perturbation)
return model
def verify_watermark(model: torch.nn.Module, watermark_key: str, layer_name: str = "lm_head.weight") -> bool:
"""Check if the expected watermark is present in the model weights."""
layer = dict(model.named_parameters())[layer_name]
key_hash = hashlib.sha256(watermark_key.encode()).digest()
indices = [key_hash[i] % layer.shape[0] for i in range(32)]
# Check if the perturbation is present (within floating-point tolerance).
for idx in indices:
if abs(layer[idx, 0].item()) < 5e-7:
return False
return True
# Usage:
model = load_base_model()
watermarked_model = embed_watermark(model, watermark_key="company-model-v2-2026Q1")
torch.save(watermarked_model.state_dict(), "model-weights-watermarked.pt")
# Verification (on a suspected stolen model).
suspected_model = load_model("suspected-model.pt")
if verify_watermark(suspected_model, watermark_key="company-model-v2-2026Q1"):
print("Watermark confirmed: this model derives from our weights.")
More robust approaches use radiometric watermarking (training-time embedding) or steganographic watermarks in the weight distribution. Research-grade systems like radioactive-data and REEF provide stronger statistical guarantees.
Step 5: Hugging Face Hub Access Control
# Verify all private model repositories have access restrictions.
huggingface-cli whoami
pip install huggingface_hub
python3 << 'EOF'
from huggingface_hub import HfApi
api = HfApi()
models = api.list_models(author="company-org")
for model in models:
info = api.model_info(model.id)
if info.private == False:
print(f"WARNING: {model.id} is PUBLIC")
else:
# Check collaborators.
collaborators = api.list_repo_collaborators(model.id)
for collab in collaborators:
print(f" {model.id}: collaborator {collab.username} ({collab.role})")
EOF
# Rotate HF tokens quarterly and on employee departure.
# List all tokens.
# (No CLI for token listing; use the HF web UI or API with user admin token.)
# Create a scoped token with minimal permissions for inference (read-only).
# In HF UI: Settings → Access Tokens → New Token
# Type: Fine-grained → Repository: read (specific repos only)
Step 6: Offboarding Controls for ML Engineers
# On ML engineer departure:
# 1. Revoke AWS IAM access immediately.
aws iam list-user-policies --user-name departing-engineer
aws iam delete-user --user-name departing-engineer
# 2. Rotate any Hugging Face tokens the engineer may have created.
# (Requires HF org admin access; revoke all tokens associated with the user.)
# 3. Audit recent S3 access from the engineer's IAM user.
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=Username,AttributeValue=departing-engineer \
--start-time 2026-04-01T00:00:00Z \
| jq '.Events[] | select(.Resources[].ResourceType == "AWS::S3::Object") |
{time: .EventTime, bucket: .Resources[0].ResourceName}'
# 4. Check for any unusual data transfer from the engineer's recent sessions.
# Look for large S3 downloads in CloudTrail: GetObject calls on model bucket in last 30 days.
Step 7: Telemetry
model_weight_access_total{principal, model, operation} counter
model_weight_bulk_download_total{principal, byte_count} counter
model_weight_unexpected_principal_total{principal} counter
model_weight_external_network_egress_blocked{pod} counter
hf_token_usage_total{token_scope, operation} counter
model_watermark_verification_result{model, result} gauge
Alert on:
model_weight_bulk_download_total— any download of > 1GB of model weights is worth auditing.model_weight_unexpected_principal_total— a non-allowlisted IAM role accessed model weights.model_weight_external_network_egress_blocked— serving container attempted external connection.hf_token_usage_totalwith an unexpectedtoken_scope— a broad token used where a narrow one should be.
Expected Behaviour
| Signal | Unprotected model weights | Hardened model weights |
|---|---|---|
| Overprivileged IAM access | Any S3 user can download | Scoped to specific prefixes and roles; IP-restricted |
| Bulk download anomaly | Silent | CloudTrail metric alert after N GetObject calls |
| Serving container exfiltration | Container can connect to any external IP | NetworkPolicy blocks all external egress |
| Stolen weights deployed by competitor | No evidence of derivation | Watermark verification proves derivation |
| Hugging Face public exposure | Models discoverable | Org-wide private; no public repositories |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| KMS-encrypted weights | Second encryption layer beyond SSE-S3 | KMS call latency on every read | Use bucket keys (one KMS call per prefix per hour); acceptable for large weights loaded once. |
| Scoped IAM per job | Precise access control | More IAM roles to manage | Template roles; job-id tagging provides dynamic scoping without one role per job. |
| Parameter watermarking | Post-theft evidence | Imperceptible watermarks may be stripped with fine-tuning | Use training-time watermarking (radioactive-data) for stronger robustness. |
| NetworkPolicy on inference pods | Blocks serving-container exfiltration | Breaks serving pods that reach external APIs | Allowlist only the specific external services the model needs for inference (e.g., a tokenizer API). |
| Offboarding access revocation | Immediate denial post-departure | CloudTrail retrospective shows access before revocation | Alert on large downloads in the 30 days before departure; audit on offboarding. |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| IAM role too broad (prefix glob) | Inference role reads checkpoints outside its scope | CloudTrail shows access to unexpected prefixes | Tighten IAM resource ARN to exact model prefix. |
| CloudTrail alert threshold too low | False positives from normal inference weight loading | Frequent alerts for legitimate access | Tune threshold to actual weight shard count; typical inference loads 1 checkpoint = N shards. |
| Watermark stripped by fine-tuning | Verification fails on derivative model | Verification returns false negative | Use more robust watermarking scheme; combine parameter watermark with training-time approach. |
| HF token leaked via CI log | Model repository exposed to anyone with token | HF audit log shows unexpected clones | Rotate token immediately; check if model was cloned by unexpected users; evaluate re-upload under new private repo. |
| Model weight download in egress | Large file exfiltration via HTTPS | Egress anomaly detection; S3 GetObject alert | VPC endpoint for S3 (all S3 traffic stays within AWS); firewall blocks direct internet egress. |