Privacy-Preserving ML Inference: Differential Privacy, Confidential Computing, and Training Data Protection

Problem

ML models memorize training data. The degree varies by model size, data repetition, and training configuration — but the research is unambiguous: language models, image classifiers, and embedding models all leak information about their training sets to adversaries with query access.

The attack surface in production inference:

Membership inference: Given a data record (a user’s medical history, a document, a transaction), an attacker can query the model to determine whether that record appeared in the training data with better-than-random accuracy. This is a GDPR violation if the record is personal data.
Training data extraction: Direct prompting can cause large language models to reproduce verbatim training data — email addresses, passwords, copyrighted text, PII from scraped datasets. DeepMind’s 2023 extraction study found millions of memorized records in common LLMs.
Model inversion: By querying a model with optimized inputs, an attacker can reconstruct approximate representations of training data — particularly for image classifiers trained on identifiable faces.
Embedding reversal: Sentence embedding models convert text to dense vectors. Given the embedding, an attacker can approximately reconstruct the original text — deanonymizing records stored as embeddings.

The regulatory implication: GDPR Article 17 (right to erasure) requires that a data subject’s information can be removed from a system. For a model that memorized the data during training, this is technically hard — the only current solution is retraining from scratch on a dataset that excludes the subject.

By 2026, the EU AI Act (effective August 2026 for general-purpose AI) requires risk documentation and technical controls for high-risk systems that process personal data. Privacy-preserving inference is no longer a research topic — it’s a compliance requirement.

Specific gaps in default inference deployments:

Models served without output filtering; arbitrary memorized text is reproducible on demand.
Embedding endpoints return raw high-dimensional vectors with no obfuscation.
No differential privacy budget tracking; the total privacy leakage across all queries is unmeasured.
Training pipelines don’t apply DP-SGD; privacy guarantees are informally claimed but not mathematically bounded.
Inference infrastructure is not isolated in a TEE; the model provider can observe all queries.

Target systems: PyTorch 2.3+ with Opacus 1.4+ (DP-SGD); TensorFlow Privacy 0.9+; Gramine 1.7+ or Occlum 0.29+ (TEE-based inference); Azure Confidential Computing / AWS Nitro Enclaves for managed TEE; Presidio 2.2+ (PII detection in outputs).

Threat Model

Adversary 1 — Membership inference attacker: An adversary with API access to an inference endpoint sends queries designed to detect whether specific individuals appear in the training data. Targets include medical records, financial transactions, user behavior logs.
Adversary 2 — Training data extraction attacker: An adversary with API access probes the model with crafted prompts, extracting verbatim memorized training data — PII, passwords, internal documents.
Adversary 3 — Model inversion attacker: An adversary with access to a classification or embedding endpoint reconstructs approximations of training data by optimizing inputs to maximize specific output activations.
Adversary 4 — Inference provider: The infrastructure that hosts the model can observe every query and response. In a third-party inference setup (API provider), this includes the provider organization. TEE-based inference prevents this.
Adversary 5 — Embedding deanonymization: An adversary who obtains a database of text embeddings reverses them to recover the original sensitive text (medical notes, PII, confidential communications stored as embeddings).
Access level: Adversaries 1–3 have inference API access (possibly paid/free public access). Adversary 4 has infrastructure access. Adversary 5 has database access.
Objective: Extract training data records, determine membership of individuals in sensitive datasets, deanonymize stored embeddings, or observe private queries.
Blast radius: Without controls: any user who can query the model can extract memorized training data or determine dataset membership. With DP + output filtering + TEE: privacy leakage is mathematically bounded; individual records are protected.

Configuration

Step 1: Differential Privacy During Training with DP-SGD

Apply differential privacy during model training using Opacus (PyTorch):

import torch
from torch.utils.data import DataLoader
from opacus import PrivacyEngine
from opacus.validators import ModuleValidator

# Validate that the model architecture is DP-compatible.
# Some layers (BatchNorm, LSTM) require replacement for DP.
model = MyModel()
errors = ModuleValidator.validate(model, strict=False)
if errors:
    model = ModuleValidator.fix(model)   # Auto-replaces incompatible layers.

optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
data_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# Attach the privacy engine.
privacy_engine = PrivacyEngine()
model, optimizer, data_loader = privacy_engine.make_private_with_epsilon(
    module=model,
    optimizer=optimizer,
    data_loader=data_loader,
    epochs=20,
    target_epsilon=8.0,    # Privacy budget: lower = stronger privacy, lower utility.
    target_delta=1e-5,     # Probability of epsilon-privacy violation.
    max_grad_norm=1.0,     # Gradient clipping; controls sensitivity.
)

# Training loop.
for epoch in range(20):
    for batch in data_loader:
        optimizer.zero_grad()
        loss = criterion(model(batch.x), batch.y)
        loss.backward()
        optimizer.step()

# Report the actual privacy spent.
epsilon = privacy_engine.get_epsilon(delta=1e-5)
print(f"Training complete. ε = {epsilon:.2f}, δ = 1e-5")

The (ε, δ)-DP guarantee means: for any pair of adjacent datasets (differing in one individual’s record), the probability that the model’s output reveals information specific to that record is bounded. ε = 8.0 is a reasonable practical target for text classification; for more sensitive domains, target ε ≤ 3.0.

Membership inference resilience at different epsilon values:

ε	Membership inference attack AUC	Utility impact
0.1	~0.51 (near random)	High utility loss
1.0	~0.53	Moderate utility loss
8.0	~0.58	Minor utility loss
∞ (no DP)	~0.65–0.85 (depends on model)	No utility loss

Step 2: Output Filtering for Memorized Data

Even without DP training, output filtering reduces extraction risk:

import re
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()

def filter_model_output(text: str) -> str:
    # Detect PII in model output.
    results = analyzer.analyze(
        text=text,
        language="en",
        entities=["PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER",
                  "CREDIT_CARD", "US_SSN", "IP_ADDRESS",
                  "LOCATION", "DATE_TIME"]
    )
    if not results:
        return text

    # Anonymize detected PII.
    anonymized = anonymizer.anonymize(
        text=text,
        analyzer_results=results
    )
    return anonymized.text

def filtered_inference(prompt: str, model, tokenizer) -> str:
    raw_output = model.generate(
        tokenizer.encode(prompt, return_tensors="pt"),
        max_new_tokens=512,
        do_sample=True,
        temperature=0.7,
    )
    decoded = tokenizer.decode(raw_output[0], skip_special_tokens=True)

    # Filter PII before returning.
    return filter_model_output(decoded)

For LLMs, also filter for exact training data reproduction:

import hashlib

# Pre-compute hashes of known-sensitive training documents.
SENSITIVE_HASHES = set()  # Populated from training data manifest.

def detect_verbatim_reproduction(output: str, window: int = 100) -> bool:
    words = output.split()
    for i in range(len(words) - window + 1):
        window_text = " ".join(words[i:i+window])
        window_hash = hashlib.sha256(window_text.encode()).hexdigest()
        if window_hash in SENSITIVE_HASHES:
            return True
    return False

Step 3: TEE-Based Inference for Query Privacy

For scenarios where the inference provider must not observe queries (confidential ML as a service), run inference inside a Trusted Execution Environment:

Using Gramine (Intel SGX):

# gramine-manifest.template for model inference.
loader.entrypoint = "file:{{ gramine.libos }}"
libos.entrypoint = "/usr/bin/python3"

loader.argv = [
    "python3",
    "/app/inference_server.py",
    "--model-path", "/model/llm.bin",
    "--host", "0.0.0.0",
    "--port", "8080",
]

# Seal model weights to the enclave; only this enclave can read them.
sgx.files.sealed = "/model/llm.bin"

# All network traffic encrypted with TLS inside the enclave.
sgx.trusted_files = [
    "file:/usr/bin/python3",
    "file:/app/inference_server.py",
    "file:/etc/ssl/certs/ca-certificates.crt",
]

sgx.max_threads = 16
sgx.enclave_size = "16G"   # Must hold model weights; large for LLMs.

The enclave provides:

Model confidentiality: Model weights are sealed to the SGX enclave; the cloud provider cannot read them.
Query confidentiality: Queries arrive via attested TLS channels; only the enclave decrypts them.
Remote attestation: The client verifies the enclave measurement (MRENCLAVE) matches the known-good binary before trusting it.

Remote attestation in the client:

import grpc
from gramine_ratls import create_ra_tls_channel

# Verify the enclave's remote attestation before sending queries.
# ra_tls_channel verifies the MRENCLAVE matches the published measurement.
channel = create_ra_tls_channel(
    target="sgx-inference.internal:8080",
    expected_mrenclave="<hex-mrenclave-of-inference-binary>",
    expected_mrsigner="<hex-mrsigner>",
)
stub = InferenceStub(channel)
response = stub.Predict(PredictRequest(input=user_query))

AWS Nitro Enclaves (managed alternative):

# Package the inference application as an enclave image.
nitro-cli build-enclave \
  --docker-uri inference-app:latest \
  --output-file inference.eif

# Run the enclave.
nitro-cli run-enclave \
  --eif-path inference.eif \
  --cpu-count 4 \
  --memory 16384

# Attestation document is automatically available inside the enclave.
# The client verifies it before sending data.

Step 4: Embedding Privacy — Vector Obfuscation

For embedding endpoints, add calibrated Laplace noise to the output vectors:

import numpy as np

def private_embedding(text: str, model, sensitivity: float, epsilon: float) -> np.ndarray:
    embedding = model.encode(text)

    # Calibrated Laplace noise for epsilon-DP.
    # scale = sensitivity / epsilon
    # sensitivity = max distance between embeddings of adjacent inputs.
    noise_scale = sensitivity / epsilon
    noise = np.random.laplace(0, noise_scale, embedding.shape)

    # Normalize the noisy embedding to unit sphere (cosine similarity preserved approx.)
    noisy_embedding = embedding + noise
    return noisy_embedding / np.linalg.norm(noisy_embedding)

The noise level is calibrated to make embedding reversal computationally infeasible while preserving approximate similarity search accuracy. For ε = 1.0, embedding reversal accuracy degrades from ~80% to ~30% on standard benchmarks.

For higher-sensitivity cases, use TextObfuscator before embedding:

# Obfuscate before embedding — useful for anonymization pipelines.
from presidio_analyzer import AnalyzerEngine

def obfuscated_embedding(text: str, model) -> np.ndarray:
    # Strip PII before embedding.
    analyzer = AnalyzerEngine()
    results = analyzer.analyze(text=text, language="en")
    clean_text = anonymize(text, results)
    return model.encode(clean_text)

Step 5: Privacy Budget Tracking

Track cumulative privacy budget across queries to detect when the budget is exhausted:

from opacus.accountants import RDPAccountant

class PrivacyBudgetTracker:
    def __init__(self, total_epsilon: float, delta: float):
        self.accountant = RDPAccountant()
        self.total_epsilon = total_epsilon
        self.delta = delta
        self.query_count = 0

    def record_query(self, noise_multiplier: float, sample_rate: float):
        self.accountant.step(
            noise_multiplier=noise_multiplier,
            sample_rate=sample_rate
        )
        self.query_count += 1
        current_eps = self.accountant.get_epsilon(self.delta)
        if current_eps >= self.total_epsilon:
            raise PrivacyBudgetExhausted(
                f"Privacy budget exceeded: ε={current_eps:.2f} >= {self.total_epsilon}"
            )
        return current_eps

# Usage: track budget per user or per model.
tracker = PrivacyBudgetTracker(total_epsilon=10.0, delta=1e-5)

Expose budget consumption as a metric:

privacy_budget_consumed = Gauge(
    "ml_privacy_budget_consumed",
    "Current privacy budget consumption (epsilon)",
    ["model_id", "user_group"]
)

When a user exercises their right to erasure, remove their data’s influence from the model:

from torch.utils.data import DataLoader
from opacus import PrivacyEngine

def unlearn_records(model, forget_dataset, retain_dataset, steps: int = 100):
    # Gradient ascent on the forget set (maximize loss = unlearn).
    optimizer_forget = torch.optim.SGD(model.parameters(), lr=1e-4)
    forget_loader = DataLoader(forget_dataset, batch_size=32)

    for step, batch in enumerate(forget_loader):
        if step >= steps:
            break
        loss = -criterion(model(batch.x), batch.y)   # Gradient ascent.
        optimizer_forget.zero_grad()
        loss.backward()
        optimizer_forget.step()

    # Fine-tune on retain set to restore utility after unlearning.
    optimizer_retain = torch.optim.Adam(model.parameters(), lr=1e-5)
    retain_loader = DataLoader(retain_dataset, batch_size=64)
    for batch in retain_loader:
        loss = criterion(model(batch.x), batch.y)
        optimizer_retain.zero_grad()
        loss.backward()
        optimizer_retain.step()

    return model

Approximate unlearning is not a mathematically verified substitute for retraining; document the approach and its limitations for compliance purposes. For high-stakes erasure requests, retrain from scratch on the revised dataset.

Step 7: Telemetry

ml_privacy_budget_consumed{model_id, user_group}       gauge
ml_pii_filtered_output_total{model_id}                 counter
ml_membership_inference_query_detected_total            counter
ml_verbatim_reproduction_blocked_total                  counter
ml_enclave_attestation_failure_total                    counter
ml_unlearning_requests_total{status}                    counter

Alert on:

ml_privacy_budget_consumed approaching the configured total — re-evaluate query access or retrain with fresh budget.
ml_verbatim_reproduction_blocked_total non-zero — model is reproducing training data; consider output filtering expansion.
ml_enclave_attestation_failure_total non-zero — TEE attestation failing; possible tampering or version mismatch.

Expected Behaviour

Signal	Without privacy controls	With privacy controls
Membership inference AUC	0.65–0.85	0.51–0.58 (with ε=8 DP)
Training data extraction	Verbatim PII reproducible	Filtered; PII anonymized in output
Query privacy from provider	Provider observes all queries	TEE: provider cannot observe queries
Embedding reversibility	~80% reconstruction accuracy	~30% with Laplace noise (ε=1.0)
GDPR erasure compliance	Full retrain required	Approximate unlearning + documented limitations
Privacy budget visibility	Unknown cumulative leakage	Tracked per query; alert on exhaustion

Trade-offs

Aspect	Benefit	Cost	Mitigation
DP-SGD training	Mathematically bounded membership inference	5–15% accuracy degradation at ε=8; higher at lower ε	Tune ε to balance compliance requirement and acceptable utility loss; most tasks tolerable at ε=8.
Output PII filtering	Reduces training data extraction	False positives (legitimate content filtered)	Tune Presidio confidence thresholds; log filtered content for review.
TEE inference	Provider cannot observe queries	2–4× latency overhead from enclave context switches; large enclaves for LLMs	Use for highest-sensitivity use cases only; regular inference for lower-risk models.
Embedding noise	Bounds reversal attacks	Reduces approximate nearest-neighbor accuracy	Calibrate noise to your similarity search accuracy requirement.
Privacy budget tracking	Visible cumulative leakage	Budget exhaustion blocks new queries	Set budget generously for exploratory use; implement per-user budgets for sensitive data.
Approximate unlearning	Fast (minutes vs hours for retrain)	Not verified to mathematical standard	Document as approximate; combine with audit log evidence of original data removal.

Failure Modes

Failure	Symptom	Detection	Recovery
DP budget exhausted	Privacy tracker blocks queries	`PrivacyBudgetExhausted` exception; metric alert	Retrain model with a fresh privacy accounting; consider model versioning with per-version budgets.
PII filter false negative	PII appears in model output	Spot-check output logs; DSAR review	Expand entity list in Presidio; add domain-specific recognizers for your data types.
TEE attestation version mismatch	Client rejects server attestation	Attestation failure errors in client logs	Update TEE binary and publish new MRENCLAVE; update clients.
Embedding noise too high	Similarity search accuracy degrades	Search recall metrics drop	Reduce noise (increase ε); accept higher privacy leakage or restructure use case.
Unlearning inadequate	Membership inference still detects the erased record	MIA test post-unlearning still shows elevated AUC	Retrain from scratch on the revised dataset; use unlearning only for low-risk erasure requests.
DP-SGD incompatible layer	Training fails with Opacus error	Error: `BatchNorm` not supported	Run `ModuleValidator.fix(model)` before attaching the privacy engine.