Never Reimplement Crypto: Why AI-Generated TLS and Network Stacks Are Categorically Unsafe
The Problem
The security community has operated on a single durable maxim for thirty years: don’t roll your own crypto. The maxim exists not because developers are incompetent, but because cryptographic security is not a property that survives functional testing. An AES-CBC implementation that produces correct ciphertext for every test vector, passes 100% of unit tests, encrypts and decrypts perfectly under all normal conditions, and compiles without warnings can still be trivially broken by an adversary who knows where to look. The bugs that break cryptographic implementations are timing differences measured in nanoseconds, state transition sequences that no developer wrote a test for, nonce domains that exhaust faster than expected, and integer arithmetic that silently truncates. None of these appear in output-equivalence testing.
LLMs have changed the surface of this problem without changing the underlying physics. An AI model trained on GitHub, Stack Overflow, documentation sites, and cryptographic papers has read every HMAC tutorial, every AES walkthrough, every TLS implementation guide ever published. It can produce a syntactically correct, algorithmically accurate TLS 1.3 handshake implementation in Python in under three minutes. That implementation will look professional. It will handle the happy path. It will encrypt and decrypt correctly. And it will have the same class of bugs that every hand-rolled crypto implementation has had since 1995, because those bugs are structural — they arise from gaps between what a cryptographic primitive requires and what a developer implementing it for the first time knows to provide.
The difference in 2026 is speed and confidence. Before LLMs, a developer who decided to roll their own TLS would spend weeks, produce obviously incomplete code, and likely be talked out of it by colleagues during review. Now, the same developer prompts an LLM, gets back a 200-line implementation that looks complete and authoritative, and ships it within the day. The productivity benefit of AI code generation is real; this is its shadow.
This article covers the specific failure classes that appear in AI-generated cryptographic code, why they are structurally inevitable rather than fixable with better prompts, what the audit trail of 25 years of adversarial analysis of OpenSSL and BoringSSL actually represents, and the concrete controls — static analysis rules, code review gates, and deployment patterns — that prevent AI-generated crypto from reaching production.
Target systems: Python 3.10+, Go 1.22+, OpenSSL 3.x, BoringSSL, WireGuard kernel module (Linux 5.6+), cryptography library 42+, bandit 1.8+, Semgrep 1.70+, gosec 2.20+.
Threat Model
- Timing side-channel via MAC comparison: An attacker who can send arbitrary messages and observe response latency can recover a MAC key byte-by-byte. An AI-generated HMAC verifier using
==for comparison is vulnerable. The attack is practical over a LAN and demonstrated over WAN with sufficient samples. The attacker recovers the key; the authentication layer is broken. - AES-GCM nonce reuse: A service encrypting millions of messages per day using random 96-bit nonces hits the birthday bound (50% collision probability) at approximately 2^48 messages — around 280 trillion. For a high-throughput service, this is a real operational window. Nonce reuse in GCM leaks the authentication key H (H = E_K(0^128)) and makes ciphertext malleable. Any two ciphertexts encrypted under the same key and nonce can be XORed to recover the XOR of the plaintexts.
- TLS state machine bypass: A TLS implementation that does not enforce strict state machine transitions will accept messages out of sequence. The attacker injects a ChangeCipherSpec early (BEAST-style), sends an unexpected renegotiation hello during a handshake, or delivers a Certificate message in a state that expects a Finished. The implementation may accept the malformed handshake and proceed with a partially initialized session — exposing plaintext or allowing MITM insertion.
- Algorithm confusion in JWT: An asymmetrically signed JWT (RS256, EC256) is verified using the public key. If the library accepts the
algheader field without validation and the attacker submits a token withalg: HS256, the verifier uses the public key as the HMAC secret — which the attacker knows. This is a complete authentication bypass. AI-generated JWT parsers omitalgvalidation consistently. - Weak key derivation: AI-generated password hashing frequently uses
hashlib.sha256(password.encode()).hexdigest()— no salt, no stretching, effectively a lookup table away from recovery. When that hash database is exfiltrated, every password is recoverable in seconds against a modern GPU. - Access level required for timing attacks: Network access to the service. HTTPS does not prevent timing attacks — the TLS layer adds noise but does not eliminate the signal; with sufficient samples (thousands of requests per byte), the channel is recoverable.
- Blast radius: MAC forgery = authentication broken across the entire service. GCM nonce reuse = confidentiality and integrity broken for affected messages. TLS state machine bypass = MITM on the session, credential exposure. JWT alg confusion = authentication bypass for any token the attacker can construct. Weak KDF = full password database recovery on breach.
The Structural Reason AI-Generated Crypto Is Unsafe
AI models learn from the distribution of code written by human developers. That distribution is dominated by code that works in the functional sense — code that passes tests, runs in production, and produces correct output under normal conditions. The adversarial cases that cryptographic implementations must handle are systematically underrepresented in training data, for two reasons.
First, adversarial test cases are rare by definition. Most of the code an LLM trains on was never subjected to adversarial testing. The OpenSSL test suite, the BoringSSL fuzzing harness, the Go crypto/tls test vectors — these exist, but they are a small fraction of the total cryptographic code the model has seen. The dominant pattern in training data is correct-for-happy-path code.
Second, the bugs that matter in cryptography are not visible in the code. A timing side-channel in a MAC comparison looks like correct code. It uses the right algorithm. It produces the right output. The bug is behavioral — it depends on the hardware branch prediction, the comparison semantics of the runtime, and the attacker’s ability to observe latency. An LLM evaluating its own output for correctness will not flag this, because the output is correct. The model cannot generate code that is safe against adversaries it has not been trained to model as part of the output-generation task.
This is not a prompt engineering problem. Asking an LLM to “generate timing-safe HMAC verification” produces code that the model believes is timing-safe, because the model’s understanding of timing safety is bounded by what it has seen in training data. If the training data contains buggy implementations (and it does — they are everywhere), the model has learned both the correct and the incorrect patterns and has no reliable mechanism to distinguish them.
Specific Failure Classes
Timing Side-Channels in MAC Verification
Every AI-generated HMAC or token verification function tested in 2025-2026 uses the == operator for comparison. This is correct Python — it compares the bytes objects for equality — but it is wrong cryptographically. Python’s bytes comparison short-circuits on the first differing byte. An adversary who can send HMAC-tagged requests and measure response time sees responses that return slightly faster when the submitted MAC shares fewer leading bytes with the expected MAC. With enough samples, this produces a byte-by-byte recovery of the expected MAC, and from the expected MAC, the key.
# AI-generated — structurally broken:
import hmac as _hmac
import hashlib
def verify_hmac(message: bytes, received_mac: bytes, key: bytes) -> bool:
expected = _hmac.new(key, message, hashlib.sha256).digest()
return expected == received_mac # Short-circuits on first differing byte.
# Response time varies with how many leading
# bytes match. Measurable over a network.
Serge Vaudenay demonstrated MAC timing attacks against SSL/TLS CBC padding in 2002. Thai Duong and Juliano Rizzo published the BEAST attack in 2011. Lucky Thirteen (2013) refined the technique against TLS implementations that had patched the obvious cases. These attacks work over real networks with real latency variance. The signal-to-noise ratio is low but the attack is patient.
The fix is a single function call that has been in Python’s hmac module since Python 2.7.7 (2014):
# Correct — constant-time comparison:
import hmac as _hmac
import hashlib
def verify_hmac(message: bytes, received_mac: bytes, key: bytes) -> bool:
expected = _hmac.new(key, message, hashlib.sha256).digest()
return _hmac.compare_digest(expected, received_mac)
# compare_digest() uses a fixed-time algorithm that evaluates all bytes
# regardless of where the first difference occurs.
# Returns False for inputs of unequal length without short-circuiting.
The Go equivalent is subtle.ConstantTimeCompare from crypto/subtle:
import (
"crypto/hmac"
"crypto/sha256"
"crypto/subtle"
)
// AI-generated Go — broken:
func verifyHMACBroken(message, receivedMAC, key []byte) bool {
mac := hmac.New(sha256.New, key)
mac.Write(message)
expected := mac.Sum(nil)
return string(expected) == string(receivedMAC) // String conversion + == still variable-time
}
// Correct:
func verifyHMAC(message, receivedMAC, key []byte) bool {
mac := hmac.New(sha256.New, key)
mac.Write(message)
expected := mac.Sum(nil)
return subtle.ConstantTimeCompare(expected, receivedMAC) == 1
}
Note that converting []byte to string and using == does not fix the problem — Go’s string comparison is also variable-time. subtle.ConstantTimeCompare is the only correct choice.
Nonce Reuse in AES-GCM
AES-GCM requires that each (key, nonce) pair be used exactly once across the lifetime of the key. The nonce is 96 bits in the standard construction. AI-generated GCM encryption generates a fresh random nonce per encryption. This is the correct instinct — a random nonce is safe in isolation — but it fails at scale due to the birthday paradox.
# AI-generated — unsafe for high-volume services:
import os
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
def encrypt_message(key: bytes, plaintext: bytes, aad: bytes = b"") -> bytes:
nonce = os.urandom(12) # 96 bits, random
ct = AESGCM(key).encrypt(nonce, plaintext, aad)
return nonce + ct
# Birthday bound for 96-bit nonce:
# P(collision) ≈ n² / 2^97
# After 2^32 messages (~4 billion): P ≈ 2^64 / 2^97 = 2^-33 (negligible)
# After 2^48 messages (~280 trillion): P ≈ 2^96 / 2^97 = 0.5 (50% chance of collision)
# A service processing 10 million messages/day hits 2^32 in ~1.2 years.
# At that scale, the random nonce approach requires key rotation before 2^32 encryptions.
# AI-generated code includes no key rotation logic and no nonce counter.
When two GCM ciphertexts share a (key, nonce) pair, the consequences are catastrophic. The authentication key H is recovered as H = E_K(0^128). With H, an attacker can forge authentication tags for arbitrary ciphertexts. The keystream for each ciphertext is also recoverable: XORing any two ciphertexts encrypted under the same (key, nonce) gives the XOR of the plaintexts.
For services where nonce uniqueness matters, a monotonic counter is safer than a random nonce:
import struct
import threading
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
class GCMEncryptor:
"""
Monotonic counter nonce — guarantees uniqueness within a process lifetime.
For distributed systems, prefix the nonce with a process/instance ID
to partition the nonce space across replicas.
"""
def __init__(self, key: bytes):
self._key = key
self._counter = 0
self._lock = threading.Lock()
def encrypt(self, plaintext: bytes, aad: bytes = b"") -> bytes:
with self._lock:
count = self._counter
self._counter += 1
# 12-byte nonce: 4 bytes zero pad + 8-byte counter
nonce = b"\x00" * 4 + struct.pack(">Q", count)
ct = AESGCM(self._key).encrypt(nonce, plaintext, aad)
return nonce + ct
def decrypt(self, ciphertext: bytes, aad: bytes = b"") -> bytes:
nonce, ct = ciphertext[:12], ciphertext[12:]
return AESGCM(self._key).decrypt(nonce, ct, aad)
For distributed systems encrypting with the same key across multiple nodes, use a nonce construction that partitions the nonce space: a fixed instance identifier in the high bits and a per-instance counter in the low bits. The cryptography library’s AESGCM class does not enforce nonce uniqueness — that is the application’s responsibility.
TLS State Machine Bugs
A correct TLS 1.3 handshake state machine enforces strict ordering: ClientHello → ServerHello → EncryptedExtensions → Certificate → CertificateVerify → Finished (server) → Finished (client). At each state, the implementation must reject any message that is not the expected next message type. An AI-generated TLS implementation handles the happy path — the sequence above — and fails to handle unexpected message injection.
The practical consequence: an attacker who can insert a packet into the TCP stream before the handshake completes can send a ChangeCipherSpec record (a TLS 1.2 construct) during a TLS 1.3 handshake. A compliant implementation ignores middlebox-compatibility ChangeCipherSpec records and continues. A naive implementation may process the record, alter its state, and proceed with an inconsistent cipher state — producing a session where the client and server have different views of the encryption state.
More dangerous: AI-generated implementations frequently omit the Finished message MAC verification. The Finished message in TLS contains a MAC over the entire handshake transcript up to that point, computed using the handshake traffic key. Verifying this MAC is what prevents MITM attacks on the handshake — without it, an attacker can rewrite the ServerHello to insert a weaker key exchange, and the client will complete the handshake without detecting the substitution.
The rule for TLS in 2026 is absolute: do not implement TLS. Use the system’s TLS library.
# Python — always use stdlib ssl (wraps OpenSSL) or an OpenSSL-backed library:
import ssl
import socket
def make_tls_connection(host: str, port: int) -> ssl.SSLSocket:
context = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
context.minimum_version = ssl.TLSVersion.TLSv1_2
context.verify_mode = ssl.CERT_REQUIRED
context.check_hostname = True
# Restrict to AEAD-only cipher suites: ECDH+AESGCM, ECDH+CHACHA20
# Excludes CBC (Lucky Thirteen), RC4, export ciphers, MD5/SHA1 MACs
context.set_ciphers("ECDH+AESGCM:ECDH+CHACHA20:!DSS:!aNULL:!eNULL")
sock = socket.create_connection((host, port))
return context.wrap_socket(sock, server_hostname=host)
// Go — use crypto/tls from the standard library:
import (
"crypto/tls"
"net"
)
func makeTLSConn(host string) (*tls.Conn, error) {
cfg := &tls.Config{
MinVersion: tls.VersionTLS12,
// Go's crypto/tls automatically prefers TLS 1.3 when supported.
// CipherSuites applies only to TLS 1.2; TLS 1.3 suites are fixed.
CipherSuites: []uint16{
tls.TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
tls.TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
tls.TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,
tls.TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256,
},
ServerName: host,
}
conn, err := tls.Dial("tcp", host+":443", cfg)
return conn, err
}
JWT Algorithm Confusion
JWT libraries exist because JWT is harder to implement correctly than it looks. The alg header field in a JWT specifies the algorithm used to sign the token. The correct behavior: the verifier ignores the alg field and verifies using the algorithm its configuration specifies. An AI-generated JWT parser trusts the alg field.
When an RSA-signed (RS256) JWT is presented to a verifier that accepts the alg field from the token header, an attacker who modifies alg from RS256 to HS256 causes the verifier to treat the RSA public key — which the attacker knows — as the HMAC secret. The attacker signs a forged payload with that secret. The verifier accepts it. Authentication is completely bypassed.
# AI-generated — broken (trusts alg from token header):
import base64, json, hmac, hashlib
def verify_jwt_broken(token: str, rsa_public_key_pem: bytes) -> dict:
header_b64, payload_b64, sig_b64 = token.split(".")
header = json.loads(base64.urlsafe_b64decode(header_b64 + "=="))
alg = header["alg"] # NEVER trust alg from the token itself
if alg == "HS256":
# Attacker sets alg=HS256, signs with the public key as secret
expected = hmac.new(rsa_public_key_pem, ..., hashlib.sha256).digest()
...
# Correct — use PyJWT with explicit algorithm specification:
import jwt
def verify_jwt(token: str, rsa_public_key_pem: str) -> dict:
# algorithms= is a required argument. PyJWT will raise if the token's
# alg header is not in this list, regardless of what the token claims.
return jwt.decode(
token,
rsa_public_key_pem,
algorithms=["RS256"], # Explicit, not derived from token header
options={"verify_exp": True, "verify_aud": True},
audience="your-service-name",
)
Weak Key Derivation
AI-generated password storage is consistently broken in the same way:
# AI-generated — broken (no salt, no stretching):
import hashlib
def store_password(password: str) -> str:
return hashlib.sha256(password.encode()).hexdigest()
def check_password(password: str, stored_hash: str) -> bool:
return hashlib.sha256(password.encode()).hexdigest() == stored_hash
# Also broken: variable-time comparison
This pattern appears in LLM output even when the prompt specifies “secure password storage” because the training data is dominated by tutorial code that demonstrates the hashlib API, not production password storage requirements. A SHA-256 hash with no salt is a lookup table; with a GPU producing 10 billion SHA-256 hashes per second, a 10-character alphanumeric password is recovered in seconds. Identical passwords produce identical hashes, allowing rainbow table attacks.
# Correct — use bcrypt, scrypt, or argon2:
import bcrypt
def store_password(password: str) -> bytes:
# bcrypt generates its own salt and embeds it in the output.
# cost factor 12 = ~250ms per verification on modern hardware.
return bcrypt.hashpw(password.encode("utf-8"), bcrypt.gensalt(rounds=12))
def check_password(password: str, stored_hash: bytes) -> bool:
return bcrypt.checkpw(password.encode("utf-8"), stored_hash)
# bcrypt.checkpw uses constant-time comparison internally.
For Argon2 (recommended for new implementations):
from argon2 import PasswordHasher
ph = PasswordHasher(
time_cost=3, # Number of iterations
memory_cost=65536, # 64 MiB RAM required per hash
parallelism=4, # Parallel threads
hash_len=32,
salt_len=16,
)
def store_password(password: str) -> str:
return ph.hash(password)
def check_password(password: str, stored_hash: str) -> bool:
try:
return ph.verify(stored_hash, password)
except Exception:
return False
What 25 Years of Adversarial Analysis Actually Represents
OpenSSL’s commit history contains 27,000+ commits. BoringSSL was forked from OpenSSL by Google in 2014 specifically because they needed a library that could sustain the operational requirements of Google’s TLS termination at scale — aggressive removal of legacy code paths, hardened defaults, continuous fuzzing. The BoringSSL repository has been subjected to libFuzzer, AFL, and OSS-Fuzz continuously since 2016. OSS-Fuzz has filed 10,000+ bugs against OpenSSL and BoringSSL — the majority are memory safety issues in edge cases that no unit test would ever trigger.
WireGuard’s design document was formally analyzed using the Tamarin prover — a machine-checked symbolic verification of the Noise protocol framework underlying WireGuard’s handshake. The proof demonstrates that WireGuard’s key exchange achieves forward secrecy, mutual authentication, and identity hiding under the standard Dolev-Yao adversary model. This is not documentation. It is a mathematical proof that the protocol design is correct. It says nothing about the implementation, which is why WireGuard’s Linux kernel implementation is 4,000 lines of carefully reviewed C, not 4,000 lines of generated code.
Go’s crypto/tls package implements RFC 8446 (TLS 1.3) with a test suite that runs the BoGo test framework — Google’s TLS interoperability and compliance test suite, which covers hundreds of edge cases in the state machine, record layer, and handshake. These tests represent bugs that were found in deployed implementations through adversarial research.
An AI-generated TLS implementation starts from scratch. It has never been run against BoGo. It has never been fuzzed. It has never been subjected to Tamarin analysis. It handles the cases the developer thought of, which are the cases that appear in documentation and tutorials — not the cases that appear after a researcher spends six months looking for state machine violations.
Hardening Configuration
1. Static Analysis: Detect Custom Crypto Implementations
# semgrep-crypto-rules.yaml
rules:
- id: custom-aes-implementation
patterns:
- pattern: |
def $FUNC(...):
...
$X[($Y >> $Z) & 0xFF]
...
- pattern-either:
- pattern: "def aes_$FUNC(...): ..."
- pattern: "def $FUNC_encrypt(...): ..."
- pattern: "def $FUNC_decrypt(...): ..."
message: >
Potential custom cipher implementation in $FUNC. Use cryptography.hazmat,
ssl module, or the Go standard library crypto packages. Never implement
block ciphers, stream ciphers, or MACs from scratch.
severity: ERROR
languages: [python]
- id: variable-time-mac-comparison
patterns:
- pattern: "$MAC1 == $MAC2"
- pattern-not: "hmac.compare_digest($MAC1, $MAC2)"
message: >
Potential timing side-channel: == comparison is variable-time.
Use hmac.compare_digest() for MAC/digest/token comparison.
severity: ERROR
languages: [python]
metadata:
cwe: "CWE-208"
- id: jwt-algorithm-from-header
patterns:
- pattern: |
$HEADER = json.loads(...)
...
$ALG = $HEADER["alg"]
...
message: >
JWT algorithm derived from token header. This enables algorithm confusion
attacks (RS256→HS256). Use a JWT library with explicit algorithms= parameter.
severity: ERROR
languages: [python]
- id: sha256-password-hash
pattern: hashlib.sha256($PASSWORD.encode()).hexdigest()
message: >
Raw SHA-256 for password storage has no salt and no stretching.
Use bcrypt.hashpw() or argon2.PasswordHasher instead.
severity: ERROR
languages: [python]
- id: hardcoded-iv-nonce
patterns:
- pattern: "iv = b$IV"
- pattern: "nonce = b$NONCE"
message: >
Hardcoded IV or nonce. IVs must be randomly generated per encryption,
or a strictly monotonic counter for GCM. Hardcoded values break
semantic security unconditionally.
severity: ERROR
languages: [python]
Run Semgrep against the entire codebase as part of CI:
semgrep --config semgrep-crypto-rules.yaml \
--config "p/owasp-top-ten" \
--config "p/jwt" \
--json \
--output semgrep-results.json \
src/
# Fail the build on any ERROR-severity finding:
jq -e '[.results[] | select(.extra.severity == "ERROR")] | length == 0' \
semgrep-results.json
2. bandit for Python Crypto Checks
# Run bandit with all crypto-related test IDs:
bandit -r src/ \
-t B324,B303,B304,B305,B306,B307,B323 \
-t B501,B502,B503,B504,B505,B506 \
-f json \
-o bandit-report.json
# Test ID reference:
# B303: use of MD5 or SHA1
# B304: use of DES or 3DES
# B305: use of ECB mode
# B306: use of mktemp (unrelated but common companion finding)
# B323: unverified SSL context
# B324: use of weak hash for security purposes
# B501: ssl.wrap_socket without cert verification
# B502: ssl._create_unverified_context
# B503: ssl.PROTOCOL_SSLv2 or SSLv3
# B504: weak cipher mode
# B505: RSA/DSA key < 2048 bits
# B506: use of yaml.load without Loader
# Fail CI on MEDIUM or HIGH severity findings:
python3 -c "
import json, sys
report = json.load(open('bandit-report.json'))
high_medium = [r for r in report['results']
if r['issue_severity'] in ('HIGH', 'MEDIUM')]
if high_medium:
for r in high_medium:
print(f\"{r['filename']}:{r['line_number']} [{r['issue_severity']}] {r['issue_text']}\")
sys.exit(1)
"
3. gosec for Go Crypto Checks
# Install:
go install github.com/securego/gosec/v2/cmd/gosec@latest
# Run crypto-relevant checks:
gosec -include=G401,G402,G403,G404,G405,G406,G501,G502 \
-fmt json \
-out gosec-report.json \
./...
# G401: use of weak cryptographic primitive
# G402: TLS InsecureSkipVerify set to true
# G403: RSA key < 2048 bits
# G404: use of weak random number generator (math/rand instead of crypto/rand)
# G405: use of DES or 3DES
# G406: use of MD4 or RIPEMD
# G501: use of risky cryptography algorithm (MD5, SHA1 in security contexts)
# G502: use of deprecated TLS protocol version
# Fail on HIGH confidence findings:
jq -e '[.Issues[] | select(.confidence == "HIGH")] | length == 0' \
gosec-report.json
4. CODEOWNERS for Crypto-Adjacent Paths
# .github/CODEOWNERS
# Any file touching cryptography, authentication, or network protocol
# implementation requires sign-off from the security team.
# This gate is specifically intended to catch AI-generated crypto.
/src/crypto/** @security-team
/src/auth/** @security-team
/src/tls/** @security-team
/src/network/** @security-team @platform-team
**/hmac*.py @security-team
**/encrypt*.py @security-team
**/decrypt*.py @security-team
**/jwt*.py @security-team
**/sign*.go @security-team
**/verify*.go @security-team
**/cipher*.go @security-team
Enforce the CODEOWNERS file is not bypassed:
# Branch protection rule (GitHub API / Terraform):
resource "github_branch_protection" "main" {
repository_id = github_repository.app.node_id
pattern = "main"
required_pull_request_reviews {
required_approving_review_count = 1
require_code_owner_reviews = true # CODEOWNERS review mandatory
dismiss_stale_reviews = true
}
required_status_checks {
strict = true
contexts = ["semgrep", "bandit", "gosec"]
}
}
5. Manual Audit Checklist for AI-Generated Code in Security-Adjacent Paths
When an AI-generated PR touches any authentication, encryption, or network protocol path, apply this checklist before merging:
# 1. Check for variable-time MAC/token comparison:
grep -rn "==" src/ \
| grep -iE "mac|hmac|token|digest|hash|signature|tag" \
| grep -v "compare_digest\|ConstantTimeCompare\|hmac.Equal"
# Any match is a timing side-channel candidate. Review each line.
# 2. Check for hardcoded or non-random IVs/nonces:
grep -rn -E "iv\s*=|nonce\s*=|IV\s*=" src/ \
| grep -vE "random|urandom|secrets|rand\.Read|crypto/rand"
# Any IV/nonce not derived from a cryptographic random source is broken.
# 3. Check for ECB mode — never correct for bulk encryption:
grep -rn -E "ECB|MODE_ECB|AES\.new.*mode=1|NewECBEncrypter" src/
# Any match is an unconditional break of semantic security.
# 4. Check for MD5/SHA1 in security contexts:
grep -rn -E "md5|sha1|SHA1|MD5|MD4|sha\.New\b" src/ \
| grep -vE "test|\.git|checksum|legacy|content-hash|etag|git-sha"
# MD5 and SHA1 are broken for authentication and integrity; SHA1 is deprecated
# for TLS certificate signatures. Review every security-context match.
# 5. Check for raw SHA-256 password hashing:
grep -rn -E "sha256.*password|sha512.*password|password.*sha" src/ \
| grep -v "pbkdf2\|bcrypt\|argon2\|scrypt"
# Raw SHA-N of a password is not a password hash. Requires salt + stretching.
# 6. Check that JWT parsing specifies algorithms explicitly:
grep -rn "jwt.decode\|ParseWithClaims\|verify_jwt" src/ \
| grep -v "algorithms="
# Any JWT decode that does not specify the accepted algorithm list is
# vulnerable to algorithm confusion (alg:none, RS256→HS256).
# 7. Check for TLS verification disabled:
grep -rn -E "verify=False|InsecureSkipVerify|CERT_NONE|check_hostname\s*=\s*False" src/
# Any match is a MITM invitation. No exceptions in production code.
6. WireGuard for Encrypted Network Tunnels
When the requirement is an encrypted network tunnel — service-to-service, cross-region, cross-cloud — the answer is WireGuard, not a custom implementation.
# Server node setup:
wg genkey | tee /etc/wireguard/server-private.key | wg pubkey > /etc/wireguard/server-public.key
chmod 600 /etc/wireguard/server-private.key
# /etc/wireguard/wg0.conf (server)
[Interface]
Address = 10.0.0.1/24
ListenPort = 51820
PrivateKey = <server-private-key>
# PostUp/PostDown omitted for brevity — add iptables forwarding rules
# if routing peer traffic through the server.
[Peer]
# Client node
PublicKey = <client-public-key>
AllowedIPs = 10.0.0.2/32
# Bring up the interface and verify the cryptographic suite in use:
wg-quick up wg0
wg show wg0
Expected output:
interface: wg0
public key: <server-public-key>
private key: (hidden)
listening port: 51820
peer: <client-public-key>
endpoint: <client-ip>:51820
allowed ips: 10.0.0.2/32
latest handshake: 23 seconds ago
transfer: 1.23 MiB received, 4.56 MiB sent
persistent-keepalive: off
WireGuard’s cryptographic suite is fixed and non-negotiable: Curve25519 for key exchange (ECDH), ChaCha20-Poly1305 for data encryption and authentication, BLAKE2s for hashing and keying, and a Noise IKpsk2 handshake pattern. There are no cipher negotiation parameters, no downgrade attack surface, no InsecureSkipVerify equivalent. The algorithm choices were made by the protocol designers, formally analyzed, and cannot be changed by configuration or by a developer prompt to an LLM.
Expected Behaviour After Hardening
After deploying the Semgrep rules and bandit configuration, CI blocks on any pull request that introduces a variable-time comparison in a MAC verification path. The output looks like:
semgrep: src/auth/token.py:47: [ERROR] variable-time-mac-comparison
Pattern: $MAC1 == $MAC2
Fix: use hmac.compare_digest()
bandit: src/auth/token.py:52: [HIGH] B303 Use of MD5 or SHA1
More info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_calls.html#b303-md5
The build fails. The PR cannot be merged until the finding is resolved. A developer who regenerates the crypto code with an LLM will get the same finding on the next run — the tool enforces the constraint that the LLM cannot enforce on its own output.
After CODEOWNERS configuration, any modification to files matching /src/crypto/** or **/hmac*.py requires a review approval from @security-team before the PR can be merged. This creates a human gate that catches the cases static analysis misses — structural issues in a custom state machine, logic errors in a key derivation function, missing certificate pinning — that require human adversarial review rather than pattern matching.
WireGuard’s wg show output confirms the handshake completed and the cryptographic suite is correct. There are no configuration options to audit. The suite is the suite.
Trade-offs
Static analysis false positives: The variable-time-mac-comparison Semgrep rule will flag == comparisons on digest objects in contexts that are not security-sensitive — for example, comparing two checksums for file deduplication. Each false positive requires a # nosemgrep annotation and a justification comment. This is acceptable operational overhead; the alternative is missing actual timing side-channels.
The cryptography library adds a C extension dependency: In environments that require pure-Python deployments (certain embedded systems, some restricted environments), cryptography.hazmat is not available. The fallback in CPython is hashlib (which also wraps OpenSSL for most operations) and the ssl module. There is no scenario in which the correct fallback is a custom implementation — the correct fallback is to restructure the deployment so that a C extension is available, or to use a different language.
AI can safely generate application-layer code that calls crypto functions: The rule is about implementations, not usage. An LLM generating AESGCM(key).encrypt(nonce, plaintext, aad) is generating correct usage of a correct implementation. An LLM generating the AES S-box lookup table and round function is generating an implementation. The distinction is clear in practice: if the code is calling a function from cryptography.hazmat, ssl, crypto/tls, or golang.org/x/crypto, it is usage. If the code is implementing a cipher, MAC, KDF, or protocol from algorithmic description, it is reimplementation.
WireGuard’s fixed cipher suite may conflict with compliance requirements: Some compliance frameworks (PCI-DSS, certain government standards) specify algorithm requirements that conflict with WireGuard’s suite — e.g., FIPS 140-2/3 mandates AES-GCM over ChaCha20-Poly1305. In those environments, WireGuard’s kernel implementation is not FIPS-validated, and IPsec with a FIPS-validated implementation (StrongSwan, libreswan) is the correct choice. IPsec with IKEv2 and AES-256-GCM provides equivalent security properties with FIPS validation — the trade-off is significantly higher operational complexity in exchange for compliance.
Failure Modes
Reviewing AI-generated crypto code that passes all tests: Timing side-channels, nonce domain exhaustion, and state machine bugs are invisible to functional tests. A MAC verifier with a == comparison will pass every unit test that checks correct verification and every test that checks rejection of invalid MACs. The test does not measure response latency as a function of MAC byte matches. Passing all tests is not evidence of cryptographic correctness.
Semgrep rules not applied to test files: Test code in tests/ is frequently excluded from static analysis. AI-generated test helpers that implement simplified crypto (“just for testing”) get copy-pasted into production code or serve as patterns for production code. Apply crypto static analysis rules to test files as well.
nosemgrep annotations accumulating without review: Engineers who encounter a Semgrep false positive add # nosemgrep: variable-time-mac-comparison and move on. Six months later, the annotation is on an actual timing side-channel because a refactor changed the context and nobody reviewed the suppression. Treat nosemgrep annotations as technical debt items; audit them quarterly.
CODEOWNERS bypassed via the GitHub UI: Repository administrators can merge PRs without required reviews. Ensure that no engineer outside the security team has administrator access to repositories containing cryptographic code, or that administrator merges generate an audit alert. GitHub’s push protection and branch protection rules are bypassable by administrators by design — the control is organizational, not purely technical.
Password hashing migration not applied to existing records: Switching from raw SHA-256 to bcrypt fixes new password storage but leaves existing hashed records vulnerable. The migration requires a rehash-on-login pattern: when a user authenticates successfully, check if their stored hash uses the old scheme, rehash with bcrypt, and update the record. Without this, the breach risk for existing records remains even after the code fix.
Confusing “uses TLS” with “implements TLS correctly”: A service that terminates TLS using the system OpenSSL can still be vulnerable if it accepts TLS 1.0, enables NULL cipher suites, sets verify=False on upstream connections, or skips hostname verification. The correct implementation uses the system library correctly. The Semgrep and bandit rules above catch the most common misconfigurations; supplement with testssl.sh or sslyze for external service scanning.
# sslyze — scan a service for TLS misconfigurations:
sslyze --regular --json_out=sslyze-report.json target.example.com:443
# testssl.sh — comprehensive TLS configuration audit:
docker run --rm drwetter/testssl.sh --json /tmp/testssl-out.json \
target.example.com:443