Cryptographic Agility: Designing Systems to Survive Algorithm Transitions
Problem
Every hardcoded cryptographic algorithm is a future breaking change. SHA-1 was declared broken in 2005 and prohibited in certificates in 2017. RSA-2048 has a planned NIST deprecation deadline of 2030. MD5 is already a byword for negligent implementation. AES-128 is currently safe but stands one unexpected cryptanalytic result away from being replaced by AES-256 across every compliance framework simultaneously.
Post-quantum migration is the most immediate forcing function. NIST finalized ML-KEM and ML-DSA in August 2024. Every system using RSA or elliptic-curve cryptography for key exchange or signing needs to migrate — and the migration window is measured in years, not months. Organizations that hardcoded RSA-2048 into their wire protocols, database encryption envelopes, and authentication tokens now face a choice between flag-day cutover (high risk, high coordination cost) and running parallel legacy infrastructure indefinitely (high operational cost, ongoing exposure).
A system designed for cryptographic agility avoids this dilemma. Agility means:
- Callers request a cryptographic operation by capability, not by algorithm name.
- Every encrypted or signed payload carries its algorithm identifier, so receivers can handle multiple formats simultaneously.
- Key material carries version numbers and algorithm metadata, enabling seamless rotation.
- Migration happens gradually — new algorithms run alongside old ones, receivers accept both, senders upgrade at their own pace, and old support is retired only after adoption is complete.
Systems without agility are not fragile in normal operation. They fail catastrophically when the industry turns against an algorithm and the entire fleet needs to move at once.
The specific anti-patterns this article addresses:
- Algorithm names embedded in function calls, database column names, API field names, or protocol constants, making refactoring a grep-and-pray exercise.
- Ciphertext or signature blobs stored without metadata — decryption requires knowing the algorithm out of band, and rotation requires re-encrypting every stored record.
- Key identifiers without version numbers — all consumers share the same key with no way to introduce a new one without a coordinated cutover.
- Test suites that only test one algorithm, masking incompatibilities with alternative implementations.
- Migration tooling that does not exist until a migration is already urgent.
Target systems: any service that performs symmetric encryption, asymmetric signing, key derivation, MAC computation, or hashing over user data, protocol messages, or stored objects.
Threat Model
- Adversary 1 — Cryptanalytic break: A public or private cryptanalytic result weakens a deployed algorithm. SHA-1 collision attacks, the ROCA vulnerability in RSA key generation, differential fault attacks against AES implementations. The adversary can now forge signatures or recover ciphertext.
- Adversary 2 — Harvest now, decrypt later: Encrypted data in transit or at rest is captured by an adversary expecting to decrypt it when a future computational capability (quantum or otherwise) becomes available. Data encrypted today with RSA or ECDH is in scope.
- Adversary 3 — Compliance and ecosystem withdrawal: Browser vendors, OS maintainers, and CA/Browser Forum deprecate algorithm support. Services still using the deprecated algorithm become unreachable to modern clients, causing availability failure rather than security failure.
- Access level: None required for Adversaries 1 and 2 (passive capture + future compute). Adversary 3 operates through the supply chain and standards bodies.
- Objective: Decrypt stored or captured data, forge signatures, or disrupt service availability by forcing a client-server algorithm mismatch.
- Blast radius without agility: Every affected service requires a coordinated flag-day migration. Stores of encrypted data must be decrypted and re-encrypted simultaneously. Any missed service creates a dependency chain that blocks the migration and leaves data permanently exposed.
Configuration
Step 1: Abstract Cryptographic Interfaces
Replace direct calls to algorithm-specific functions with calls to an abstraction that specifies the operation, not the mechanism. The caller says what they want — sign, encrypt, derive a key — and the implementation decides which algorithm satisfies that request.
In Go:
// Bad: algorithm baked into the call site.
func signPayload(key *rsa.PrivateKey, data []byte) ([]byte, error) {
h := sha256.Sum256(data)
return rsa.SignPKCS1v15(rand.Reader, key, crypto.SHA256, h[:])
}
// Good: operation-oriented interface.
type Signer interface {
Sign(data []byte) (signature []byte, alg AlgorithmID, err error)
Verify(data, signature []byte, alg AlgorithmID) error
AlgorithmID() AlgorithmID
}
type AlgorithmID string
const (
AlgRSAPSS256 AlgorithmID = "RS256-PSS"
AlgECDSAP256 AlgorithmID = "ES256"
AlgMLDSA65 AlgorithmID = "ML-DSA-65"
AlgHybridDSA AlgorithmID = "ES256+ML-DSA-65" // dual-stack during transition
)
// Implementations live behind the interface.
type ECDSP256Signer struct{ key *ecdsa.PrivateKey }
func (s *ECDSP256Signer) Sign(data []byte) ([]byte, AlgorithmID, error) {
h := sha256.Sum256(data)
sig, err := ecdsa.SignASN1(rand.Reader, s.key, h[:])
return sig, AlgECDSAP256, err
}
func (s *ECDSP256Signer) AlgorithmID() AlgorithmID { return AlgECDSAP256 }
The same pattern applies to encryption, MAC computation, and key derivation. The goal is that no file outside the crypto/ package contains an algorithm name.
In Python:
from abc import ABC, abstractmethod
from enum import Enum
class AlgorithmID(str, Enum):
AES_256_GCM = "A256GCM"
CHACHA20_POLY1305 = "C20P"
ML_KEM_768 = "ML-KEM-768"
class Encrypter(ABC):
@abstractmethod
def encrypt(self, plaintext: bytes, aad: bytes = b"") -> tuple[bytes, AlgorithmID]:
...
@abstractmethod
def decrypt(self, ciphertext: bytes, alg: AlgorithmID, aad: bytes = b"") -> bytes:
...
@abstractmethod
def algorithm_id(self) -> AlgorithmID:
...
The interface boundary makes algorithm substitution a matter of writing a new implementation class, not a codebase-wide refactor.
Step 2: Include Algorithm Identifiers in Every Message
Every signed or encrypted payload must carry the algorithm used to produce it. If the algorithm is implicit — known only from context, configuration, or documentation — then decryption and verification require that context to remain permanently synchronized across all systems and deployments. It will not.
For structured payloads, follow the JWS/JWE model: a header field carries the algorithm identifier alongside every ciphertext or signature.
{
"alg": "ML-DSA-65",
"kid": "signing-key-v3",
"payload": "eyJzdWIiOiJ1c2VyLTEyMyIsImV4cCI6MTc0Njc4MDgwMH0",
"signature": "3Tk9...base64url..."
}
For binary protocols, reserve a fixed-length algorithm field in your message format:
Offset Length Field
0 2 format_version (uint16, big-endian)
2 2 algorithm_id (uint16, registry of known values)
4 4 key_version (uint32, big-endian)
8 16 nonce (random bytes)
24 var ciphertext
The algorithm field must be authenticated. For signed payloads it is covered by the signature. For encrypted payloads, include it in the additional authenticated data (AAD) passed to the AEAD cipher. This prevents an attacker from stripping or replacing the algorithm identifier without detection.
// Constructing AAD that binds the algorithm and key version to the ciphertext.
aad := fmt.Appendf(nil, "alg=%s;kid=%s;ver=%d", alg, keyID, keyVersion)
ciphertext, err := aead.Seal(nil, nonce, plaintext, aad)
Step 3: Version Key Material
Keys need version numbers for two reasons. First, encryption is a one-way gate: data encrypted with a key can only be decrypted by that key or its designated successors. If the key has no version identifier, rotating it requires re-encrypting every object before the old key can be retired. Second, signing keys may have their signatures invalidated by a cryptanalytic result — if signatures do not identify which key version produced them, historical validation becomes impossible.
The pattern is simple: every key has a name and a version number. Every ciphertext envelope and signature carries the key identifier (name + version). Decryption and verification look up the right key by identifier; encryption always uses the current version.
type KeyID struct {
Name string // "customer-data-key"
Version uint32 // monotonically increasing
}
func (k KeyID) String() string {
return fmt.Sprintf("%s:v%d", k.Name, k.Version)
}
// CiphertextEnvelope carries everything needed to decrypt independently of
// any external context.
type CiphertextEnvelope struct {
KeyID string `json:"kid"` // "customer-data-key:v2"
Algorithm AlgorithmID `json:"alg"` // "A256GCM"
Nonce []byte `json:"nonce,omitempty"`
Data []byte `json:"data"`
}
Key rotation introduces v2 alongside v1. New encryptions use v2. Old ciphertexts remain decryptable by v1 until they are rotated. Version retirement is safe to schedule once audit logs confirm no decrypt requests referencing v1 have occurred for a defined period (typically 90 days).
In Vault (transit secrets engine), versioning is built in:
# Rotate: creates v2, makes it the default for new encryptions.
vault write -f transit/keys/customer-data/rotate
# Old versions remain available for decryption until explicitly retired.
vault write transit/keys/customer-data/config min_decryption_version=1
# After migrating all ciphertext to v2, raise the floor.
vault write transit/keys/customer-data/config min_decryption_version=2
Step 4: Algorithm Negotiation — TLS as the Model
TLS cipher suite negotiation solves the agility problem at the protocol level and every internal protocol design can apply the same model:
- The sender (client) announces a list of supported algorithms in preference order.
- The receiver (server) picks the strongest algorithm from the intersection.
- The selected algorithm is confirmed in the server’s response and used for the session.
- Neither party is hard-blocked by the other’s capabilities — they negotiate to a mutual optimum.
For an internal RPC or message-passing system, implement the same handshake:
message EncryptionHandshake {
repeated string client_algorithms = 1; // ["ML-KEM-768", "X25519", "P-256"]
string selected_algorithm = 2; // set by server in response
bytes ephemeral_public_key = 3;
}
Server-side selection logic picks the strongest option:
var algorithmPreference = []AlgorithmID{
AlgMLKEM768, // post-quantum hybrid; most preferred
AlgX25519, // classical; second choice
AlgP256, // legacy; only if nothing else matches
}
func selectAlgorithm(clientCapabilities []AlgorithmID) (AlgorithmID, error) {
supported := map[AlgorithmID]bool{}
for _, a := range clientCapabilities {
supported[a] = true
}
for _, preferred := range algorithmPreference {
if supported[preferred] {
return preferred, nil
}
}
return "", ErrNoMutualAlgorithm
}
This approach means adding ML-KEM support server-side immediately benefits all clients that already support it, without requiring a coordinated cutover. Deprecating P-256 later is a matter of removing it from algorithmPreference — clients that lack ML-KEM and X25519 will fail negotiation and must upgrade.
Step 5: Dual-Stack Signing During Migration
When migrating from one signing algorithm to another, there is a period when some verifiers understand the old algorithm and some understand the new one. Dual-stack signing covers both: sign with both algorithms, attach both signatures, and let each verifier use the one it supports.
type DualSignature struct {
ClassicalSig []byte `json:"classical_sig"`
ClassicalAlg AlgorithmID `json:"classical_alg"` // "ES256"
PostQuantumSig []byte `json:"pq_sig,omitempty"`
PostQuantumAlg AlgorithmID `json:"pq_alg,omitempty"` // "ML-DSA-65"
KeyID string `json:"kid"`
}
func dualSign(data []byte, classical Signer, pq Signer) (*DualSignature, error) {
cSig, cAlg, err := classical.Sign(data)
if err != nil {
return nil, fmt.Errorf("classical sign: %w", err)
}
pqSig, pqAlg, err := pq.Sign(data)
if err != nil {
return nil, fmt.Errorf("pq sign: %w", err)
}
return &DualSignature{
ClassicalSig: cSig,
ClassicalAlg: cAlg,
PostQuantumSig: pqSig,
PostQuantumAlg: pqAlg,
}, nil
}
Verification policy evolves through three phases:
| Phase | Verification policy |
|---|---|
| Rollout | Accept classical signature alone; PQ signature accepted but not required |
| Transition | Require at least one valid signature; prefer PQ if present |
| Completion | Require PQ signature; classical treated as informational |
X.509 certificate chains already support this pattern for dual-stack PKI: issue certificates signed by both a classical CA and a PQ CA; TLS stacks that understand both validate both.
Step 6: Hybrid Cryptography for Defence in Depth
Hybrid cryptography combines classical and post-quantum algorithms in a single operation. The resulting security holds as long as either algorithm is secure — if the classical algorithm is broken by a quantum computer, the PQ algorithm still protects the data; if the PQ algorithm has an unforeseen weakness, the classical algorithm still protects the data.
For key encapsulation (replacing RSA or ECDH key exchange):
// Hybrid KEM: XDH (X25519) + ML-KEM-768.
// Both encapsulate independently; the shared secrets are combined.
func hybridEncapsulate(
classicalPub *ecdh.PublicKey,
pqPub mlkem.PublicKey,
) (combinedKey []byte, encapsulation HybridEncapsulation, err error) {
// Classical KEM.
classicalEphemeral, _ := ecdh.X25519().GenerateKey(rand.Reader)
classicalShared, _ := classicalEphemeral.ECDH(classicalPub)
// PQ KEM.
pqShared, pqCiphertext, _ := mlkem768.Encapsulate(pqPub)
// Combine using HKDF. Neither secret dominates; both are required.
combined := hkdf.Extract(sha256.New,
append(classicalShared, pqShared...),
[]byte("hybrid-kem-v1"))
return combined, HybridEncapsulation{
ClassicalPublicKey: classicalEphemeral.PublicKey().Bytes(),
PQCiphertext: pqCiphertext,
Algorithm: "X25519+ML-KEM-768",
}, nil
}
This mirrors what TLS 1.3 does with X25519MLKEM768 and what OpenSSH 9.9 does with mlkem768x25519-sha256. The principle transfers directly to application-level key agreement.
Step 7: Audit a Codebase for Agility Gaps
Before a migration is urgent, find hardcoded algorithm references that will create friction when the time comes.
# Find algorithm names hardcoded in source files.
grep -rn --include="*.go" --include="*.py" --include="*.java" --include="*.ts" \
-E "(SHA1|SHA-1|MD5|RSA-?2048|AES-?128|des\b|rc4|ECB|PKCS1v15|rsa\.Sign)" \
./src/ | grep -v "_test\." | grep -v "//.*deprecated"
# Find unversioned ciphertext storage schemas.
# Look for encrypted blob columns without accompanying algorithm or version columns.
grep -rn --include="*.sql" --include="*.prisma" --include="*.go" \
-E "(encrypted|ciphertext|cipher_text|enc_data)" \
./schema/ ./migrations/ | \
while IFS=: read file line content; do
echo "$file:$line $content"
# Flag any file that mentions encrypted columns but not alg/version.
if ! grep -q "algorithm\|alg_id\|key_version\|enc_version" "$file" 2>/dev/null; then
echo " WARNING: no algorithm or version column found in $file"
fi
done
# Identify crypto library imports used directly (not via abstraction layer).
grep -rn --include="*.go" \
-E "\"crypto/rsa\"|\"crypto/ecdsa\"|\"crypto/sha1\"|\"golang.org/x/crypto/chacha20poly1305\"" \
./internal/ ./pkg/ | grep -v "crypto/" # exclude the abstraction layer itself
Produce an inventory table: file path, algorithm reference, type (hardcoded algorithm name / unversioned storage / direct library import), priority (critical = used for long-lived stored data; high = used for protocol messages; medium = used for ephemeral operations).
Step 8: Testing Algorithm Transitions
Test suites that exercise only one algorithm mask agility problems. A service that passes all tests against AES-256-GCM may silently fail to decrypt AES-128-GCM ciphertexts produced by a previous version during a rollout.
Two testing patterns:
Shadow mode: Run the new algorithm alongside the old one on every live request. Compare results. Differences surface bugs in the new implementation before it handles real traffic.
func (s *EncryptionService) EncryptWithShadow(plaintext []byte) (*CiphertextEnvelope, error) {
primary, err := s.primaryEncrypter.Encrypt(plaintext)
if err != nil {
return nil, err
}
// Shadow: run new algorithm, log differences, discard result.
if s.shadowEncrypter != nil {
shadow, shadowErr := s.shadowEncrypter.Encrypt(plaintext)
if shadowErr != nil {
s.metrics.ShadowErrors.Inc()
s.log.Warn("shadow encryption failed", "err", shadowErr)
} else {
s.metrics.ShadowSuccess.Inc()
_ = shadow // result discarded; only used to confirm no errors
}
}
return primary, nil
}
Dual decryption tests: Integration tests that produce ciphertexts under every supported algorithm version and verify that the current decryption code can handle all of them. Store a corpus of test vectors indexed by algorithm and key version.
func TestDecryptAllVersions(t *testing.T) {
vectors := loadTestVectors(t, "testdata/ciphertext-corpus.json")
for _, v := range vectors {
t.Run(fmt.Sprintf("alg=%s/kid=%s", v.Algorithm, v.KeyID), func(t *testing.T) {
plaintext, err := service.Decrypt(v.Ciphertext)
require.NoError(t, err, "must decrypt ciphertext from algorithm %s", v.Algorithm)
require.Equal(t, v.ExpectedPlaintext, plaintext)
})
}
}
The test corpus is a migration regression suite. Add new entries whenever a new algorithm ships. Entries are never removed — if the current code cannot decrypt them, the migration is broken.
Expected Behaviour
| Scenario | Without agility | With agility |
|---|---|---|
| SHA-1 deprecated by CA/B Forum | Flag-day: update all code, re-sign all artifacts, redeploy simultaneously | Update signing implementation; verifiers accept SHA-1 and SHA-256 during transition; SHA-1 retired after adoption completes |
| Post-quantum migration mandated | Re-architect key exchange in every service; coordinated cutover required | Add ML-KEM implementation; negotiate up from classical for each peer that supports it; classical retired per service as peers upgrade |
| Cryptanalytic break on AES-128 | Emergency incident: find every AES-128 usage, rekey, redeploy under change freeze | Rotate key version; new ciphertexts use AES-256; old ciphertexts decryptable until re-encrypted; SLA for re-encryption is 30 days, not hours |
| New compliance requirement (FIPS 140-3) | Audit entire codebase for non-compliant algorithms; remediation measured in months | Swap algorithm implementation behind the interface; compliance applies to every caller immediately |
| Onboard a new service that only supports older algorithm | Compatibility shim bolted on; technical debt accumulates | Negotiation selects the best mutually supported algorithm; new service is a first-class participant |
Trade-offs
| Design choice | Security benefit | Cost | Mitigation |
|---|---|---|---|
| Algorithm field in every message | Enables multi-algorithm support; prevents algorithm stripping | Adds bytes to every payload; AAD construction adds complexity | Fixed-width binary fields add 2–4 bytes; negligible for most payloads |
| Versioned key material | Safe rotation without flag days; support for parallel key versions | Key management service must track multiple versions; lookup adds latency | KMS lookup is typically sub-millisecond; cache current version locally |
| Dual-stack signing | Both classical and PQ verifiers work during transition | Signature size roughly doubles; signing time doubles | PQ signatures (ML-DSA-65 ~3.3 KB) are acceptable for most use cases; use dual-stack only during the transition window |
| Abstracted crypto interface | Enables algorithm swaps without callers changing | Adds a layer of indirection; harder to trace exact algorithm from call site | Generate documentation from the interface listing active algorithm IDs; make the mapping explicit |
| Negotiation protocol | Automatic upgrade as peers add support | More complex than a static config; negotiation adds a round trip | For session-oriented protocols the round trip is unavoidable; for message-based protocols embed capabilities in message headers |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Algorithm field missing from stored ciphertext | Decryption fails after algorithm rotation because code cannot determine which algorithm to use | Decryption error rate spikes after key rotation; errors reference unknown format | Require algorithm field in schema migrations; add format validation on write path |
| Test corpus not updated | New algorithm works in unit tests but fails to decrypt old ciphertext in production | Decryption errors for specific key versions in production; unit tests pass | Add old-format vectors to the corpus before removing old support; run corpus tests in CI |
| Negotiation allows downgrade to weak algorithm | Client and server negotiate to the weakest common option, not the strongest | Monitoring shows unexpected algorithm usage; older algorithms appear in negotiation logs | Enforce server-side preference order; remove deprecated algorithms from the supported list on a schedule; monitor algorithm distribution in telemetry |
| Key version not carried through to audit log | Cannot determine which key version decrypted a given record; compliance audit fails | Audit log entries missing key version fields | Include key ID and version in every decrypt/sign log entry as structured fields |
| Dual-stack signing abandoned partway through | Some artifacts have both signatures; others have only classical; policy enforcement inconsistent | Verification policy logic branches on presence of PQ signature, producing inconsistent security guarantees | Enforce dual-stack requirement in the signing pipeline via CI gate; scanner rejects release artifacts without PQ signature |
| Algorithm abstraction bypassed in one service | Direct algorithm call is not covered by the rotation tooling | Algorithm audit (Step 7) finds the exception; rotation creates a gap | Zero-tolerance policy on direct algorithm calls outside the crypto/ package; enforce with a linter rule |
When to Consider a Managed Alternative
Algorithm versioning, key rotation, and ciphertext metadata are solved problems in managed KMS products. If your team is building these patterns from scratch, evaluate whether cloud-managed KMS covers the requirement first.
- AWS KMS handles key versioning, multi-version decryption, algorithm metadata in ciphertext envelopes, and key rotation schedules. Envelope encryption with KMS data keys provides agility at the application layer for bulk data.
- Google Cloud KMS offers equivalent functionality with crypto key versions and purpose-bound keys. Cloud KMS supports the CMEK model where customer-managed keys wrap service data keys.
- HashiCorp Vault transit secrets engine provides algorithm-agile encryption, decryption, and signing with key versioning built in. Supports AES-GCM, ChaCha20-Poly1305, Ed25519, ECDSA, and RSA with explicit version tracking on every ciphertext.
- JOSE (JWT/JWE/JWS) libraries implement the algorithm-identifier-in-header pattern correctly and are available in every major language. Use them for structured token and envelope formats rather than inventing a bespoke binary format.
The managed route is almost always faster to implement correctly. Build the abstracted interface around the managed service so that swapping providers later is an implementation-layer change, not a call-site change.