Software Supply Chain in the AI Coding Era: When Your Dependency Is a Prompt

Software Supply Chain in the AI Coding Era: When Your Dependency Is a Prompt

The Problem

The log4shell vulnerability (CVE-2021-44228) was catastrophic partly because log4j was a transitive dependency — it appeared in the dependency trees of thousands of applications whose engineers had never heard of it. The response required three steps: find every application using log4j, update the dependency, redeploy. Dependency scanning made this tractable because log4j appeared in declared manifest files. Grype could scan an SBOM and find it. Dependabot could open a PR. The machinery worked because there was a named, versioned artifact with a known CVE to track.

AI-generated code creates a new variant of this problem that the entire SBOM machinery cannot address: inlined, undeclared dependencies. When an engineer asks an LLM to “implement JWT verification” or “write a function to parse YAML safely” and ships the result, that code has no entry in package.json, go.mod, or requirements.txt. Snyk does not scan it. Dependabot does not open a PR for it. It has no CVE feed, no upstream maintainer to issue patches, and no disclosed licence. If the AI-generated JWT parser contains a signature-verification bypass — a class of bug that appeared in python-jose in CVE-2024-33664, in PyJWT in CVE-2022-29217, and in node-jsonwebtoken in CVE-2022-23529 — there is no CVE to alert on and no package to update.

This is not a theoretical scenario. GitHub’s 2025 research found that over 50% of code in active repositories is AI-assisted. A significant fraction of that code inlines functionality that would previously have been a named dependency. The US government’s executive order on software security (EO 14028) mandated SBOM for software sold to federal agencies. That mandate was written assuming a world where security-relevant code had declared provenance. That assumption is now incorrect for a substantial proportion of production codebases.

What Gets Inlined

The categories of functionality that AI coding tools routinely inline — because a developer asked for “quick implementation” rather than “which library should I use” — are precisely the categories with the most security-relevant CVE history:

JWT parsing and validation. LLMs frequently generate manual base64url-decode-and-verify implementations when asked to “add JWT auth.” The replaced libraries — PyJWT, python-jose, golang-jwt, node-jsonwebtoken — have a combined CVE history spanning algorithm confusion, timing attacks, and none algorithm acceptance. None of those CVEs apply to AI-generated inline code. That is not a safety property; it means there is no signal when the same class of vulnerability is present in the AI-generated version.

YAML parsing with custom deserialization logic. Asking an LLM to “parse this YAML config” often produces yaml.load(data, Loader=yaml.FullLoader) or equivalent unsafe patterns, bypassing the safe-load constraints that PyYAML specifically added in response to CVE-2017-18342 and related deserialisation vulnerabilities. The AI is not wrong that the code runs — it is wrong that it is safe.

UUID and token generation using random instead of secrets. Python’s random module is explicitly documented as unsuitable for cryptographic use. LLMs frequently use it for token generation because it is simpler. The replaced secrets module or os.urandom has no CVE feed either, but at least the correct choice is a named, reviewable standard library call.

HTTP retry and circuit-breaking logic. Replacing tenacity or backoff with hand-written retry loops is cosmetically harmless but creates divergence from a maintained library’s bug history. When tenacity fixes a race condition in retry state management, the fix is not applied to the inlined version.

Cryptographic hashing for password storage. LLMs asked to “store passwords securely” sometimes generate hashlib.sha256(password.encode()).hexdigest() — a pattern that replaced bcrypt or argon2-cffi. Neither bcrypt nor argon2-cffi are particularly complex libraries, but they embody the work-factor tuning and salt-handling that SHA-256 applied directly does not.

Database connection pooling. Replacing psycopg2-pool or SQLAlchemy’s pooling with a hand-written pool introduces potential connection leak patterns that the library’s maintainers have fixed in response to production incidents over years. The inline version starts from zero on that error history.

Threat Model

Critical vulnerability in AI-generated inline JWT parser, no CVE exists. An attacker discovers that your application’s AI-generated JWT verification function accepts tokens signed with the none algorithm — a classic vulnerability class first described in 2015. No CVE entry exists for your inlined code. Grype scans your container image and finds no high-severity findings. Snyk monitors your declared dependencies and finds nothing. The vulnerability is present, exploitable, and invisible to every automated detection tool in your pipeline. The only discovery path is manual code review or penetration testing.

AI-generated YAML parser with unsafe deserialization, no dependency to update. Your application uses a hand-written YAML config parser generated by Copilot six months ago. A security researcher publishes a proof-of-concept demonstrating code execution through YAML deserialization against the pattern your code uses. The researcher files a CVE against PyYAML’s FullLoader, but your code does not use PyYAML’s FullLoader — it uses an LLM-generated equivalent. The CVE does not apply. Your scanner does not flag your code. The vulnerability is present in production for the duration of time it takes someone to manually connect the vulnerability class to your codebase.

AI-generated UUID generator with insufficient entropy, no supply chain alert. Your authentication service generates session tokens using random.choices(string.ascii_letters + string.digits, k=32) — an LLM-generated implementation. The PRNG seed is time-based. A researcher publishes a practical attack against this pattern against your framework. No SBOM entry, no CVE, no scanner finding. The first signal is a breach investigation that traces back to predictable session tokens.

SBOM audit failure under regulatory obligation. A customer with contractual SBOM requirements requests your current SBOM. Your SBOM generation pipeline runs syft dir:. against your repository, producing a JSON document listing all declared packages with their SBOMIDs, licence identifiers, and version numbers. The SBOM is missing several hundred security-relevant functions that are implemented inline in AI-generated code. Your SBOM is materially incomplete. Under EU Cyber Resilience Act Article 13(5), which mandates accurate SBOMs for products placed on the EU market from 2027, this is a compliance finding.

Licence exposure from unlicensed inlined functionality. Your product ships a compliance matrix showing it contains no GPL-licensed code. Your AI-generated inline utility functions include a hash of GPL-licensed training data that surfaces in a licence audit. The functions have no licence header. The question of whether LLM output can carry licence obligations from training data is currently unresolved in law, but the operational reality is that you cannot state a licence for code with no declared provenance — which is itself a compliance problem for customers in regulated industries.

Hardening Configuration

1. Mandatory Dependency-First Engineering Policy

The upstream fix is cultural: require engineers to evaluate whether a maintained library exists before asking an LLM to generate an implementation. This policy must be codified, not assumed.

# Engineering Standards — AI Code Generation Policy v1.2

## Evaluating new functionality

Before prompting an AI coding tool to implement any of the following
functionality categories, you MUST check whether a maintained library
covers the use case:

### Security-critical categories (library required unless documented exception):
- Token/JWT generation, parsing, or validation
- Password hashing and verification
- Cryptographic operations (encryption, signing, key derivation)
- Random number generation for security purposes
- Deserialisation of untrusted input (YAML, XML, pickle, msgpack)
- Input sanitisation or escaping

### Infrastructure categories (library strongly preferred):
- HTTP client retry, timeout, circuit-breaking logic
- Database connection pooling
- Rate limiting
- Structured logging
- Metrics collection

## Decision order

1. Standard library (always prefer — no additional dependency surface)
2. Official SDK for the specific service (AWS SDK, GCP client libraries)
3. CNCF/OpenSSF ecosystem library for infrastructure patterns
4. Well-maintained open-source: >1000 GitHub stars, commits within 6 months,
   published security policy
5. AI-generated inline implementation — only if no library exists AND use case
   is narrowly scoped AND security team has reviewed

## Documentation requirement

Any PR containing AI-generated code that implements functionality in a
security-critical category MUST include in the PR description:

"Dependency evaluation: evaluated [library names]. Did not use because
[reason]. Inline implementation reviewed by [name] on [date]."

PRs missing this documentation for security-critical categories are
blocked from merge by the reviewer checklist requirement.

This policy has teeth only if it is enforced at review time, not as a suggestion.

2. AST-Based Scanner for Inlined Library Functionality

Static pattern matching against source code detects the most common classes of inlined security functionality. The following scanner targets Python; the same approach applies to Go and JavaScript with language-appropriate AST tooling.

#!/usr/bin/env python3
"""
scan_ai_reimplementations.py

Detect source files that contain patterns associated with reimplementing
security-relevant library functionality. Intended to run in CI alongside
traditional dependency scanners.

Usage:
    python3 scan_ai_reimplementations.py src/ --fail-on high
    python3 scan_ai_reimplementations.py src/ --output json
"""
import ast
import re
import sys
import json
import argparse
from pathlib import Path
from dataclasses import dataclass, field
from typing import Optional

@dataclass
class Finding:
    file: str
    line: int
    category: str
    severity: str  # HIGH, MEDIUM, LOW
    description: str
    pattern: str
    remediation: str

# Regex patterns checked against raw source text.
# Each pattern is (regex, category, severity, description, remediation).
PATTERN_RULES: list[tuple[str, str, str, str, str]] = [
    # JWT manual parsing: base64-decoding header.payload.signature manually
    (
        r'base64[_\.]?(b64decode|urlsafe_b64decode).*\.split\(["\']\\.["\']',
        "jwt-manual-parse",
        "HIGH",
        "Manual JWT base64 parsing detected. This pattern reimplements "
        "JWT decoding without the signature verification that PyJWT, "
        "python-jose, or golang-jwt enforce by default.",
        "Replace with PyJWT: `import jwt; payload = jwt.decode(token, key, algorithms=['RS256'])`",
    ),
    # JWT none-algorithm risk: checking for algorithm 'none' explicitly
    # suggests the code is trying to handle it rather than reject it
    (
        r'''["\']algorithm["\']\s*[:=]\s*["\'](none|NONE)["\'']''',
        "jwt-none-algorithm",
        "HIGH",
        "JWT 'none' algorithm reference. Code that handles the 'none' "
        "algorithm may accept unsigned tokens. CVE-2015-9235 (node-jsonwebtoken), "
        "CVE-2022-29217 (PyJWT) and similar.",
        "Use a library configured to reject the 'none' algorithm explicitly.",
    ),
    # YAML unsafe load: yaml.load without SafeLoader
    (
        r'yaml\.load\s*\([^)]*(?:Loader\s*=\s*(?:yaml\.)?(?:Loader|FullLoader|UnsafeLoader)|[^S])',
        "yaml-unsafe-load",
        "HIGH",
        "yaml.load() called with a non-safe Loader. Equivalent to the "
        "deserialization vulnerability class in CVE-2017-18342 (PyYAML). "
        "Arbitrary Python object instantiation is possible with untrusted input.",
        "Use yaml.safe_load() or yaml.load(data, Loader=yaml.SafeLoader)",
    ),
    # Weak random for tokens/passwords/secrets
    (
        r'random\.(choice|choices|randint|random|randbytes)\s*\([^)]*\).*(?:token|secret|password|key|nonce|salt|csrf|session)',
        "weak-random-security",
        "HIGH",
        "Python random module used in a security context. random is not "
        "cryptographically secure (Mersenne Twister PRNG). Predictable "
        "output enables session forgery and token prediction attacks.",
        "Use secrets.token_hex(), secrets.token_urlsafe(), or os.urandom()",
    ),
    # SHA for password hashing: hashlib.sha256/sha512 applied to a password
    (
        r'hashlib\.(sha256|sha512|sha1|md5)\s*\(\s*(?:password|passwd|pwd)',
        "password-hashing-weak",
        "HIGH",
        "SHA-2 or MD5 applied directly to a password. These are general-purpose "
        "hash functions with no work factor and no built-in salting. "
        "Vulnerable to offline brute-force and rainbow table attacks.",
        "Use bcrypt (passlib.hash.bcrypt) or argon2-cffi (argon2.PasswordHasher)",
    ),
    # Hardcoded HMAC secret inline with JWT-like structure
    (
        r'hmac\.new\s*\([^)]*b["\'][A-Za-z0-9+/]{16,}["\']',
        "hardcoded-hmac-key",
        "HIGH",
        "HMAC constructed with what appears to be a hardcoded key. "
        "Hardcoded keys cannot be rotated and are exposed in source code.",
        "Load signing keys from environment variables or a secret manager.",
    ),
    # Manual XML parsing without defusedxml
    (
        r'(?:xml\.etree|xml\.dom|xml\.sax).*(?:parse|fromstring|parseString)',
        "xml-unsafe-parse",
        "MEDIUM",
        "Standard library XML parser used without defusedxml. Vulnerable to "
        "XXE (XML External Entity) injection and billion-laughs DoS. "
        "CVE-2019-20907 (Python xml) and related.",
        "Use defusedxml.ElementTree instead of xml.etree.ElementTree",
    ),
    # Custom retry loop that catches all exceptions broadly
    (
        r'except\s+Exception.*:\s*\n\s+(?:time\.sleep|asyncio\.sleep)',
        "broad-exception-retry",
        "LOW",
        "Broad exception catch with sleep — likely a hand-written retry loop. "
        "Consider tenacity or backoff for maintained retry semantics.",
        "Use tenacity (@retry decorator) or backoff for retry logic",
    ),
]

def scan_file(filepath: Path) -> list[Finding]:
    try:
        content = filepath.read_text(encoding="utf-8", errors="replace")
    except OSError:
        return []

    findings: list[Finding] = []
    lines = content.splitlines()

    for pattern, category, severity, description, remediation in PATTERN_RULES:
        for match in re.finditer(pattern, content, re.IGNORECASE | re.MULTILINE):
            # Find line number for the match
            line_num = content[: match.start()].count("\n") + 1
            findings.append(Finding(
                file=str(filepath),
                line=line_num,
                category=category,
                severity=severity,
                description=description,
                pattern=pattern,
                remediation=remediation,
            ))

    return findings


def scan_directory(root: Path, extensions: tuple[str, ...] = (".py",)) -> list[Finding]:
    all_findings: list[Finding] = []
    for path in root.rglob("*"):
        if path.suffix in extensions and path.is_file():
            # Skip vendored, test, and generated code
            parts = path.parts
            if any(p in parts for p in ("vendor", "node_modules", ".venv", "venv", "__pycache__")):
                continue
            all_findings.extend(scan_file(path))
    return all_findings


def main() -> int:
    parser = argparse.ArgumentParser(description="Scan for inlined security library reimplementations")
    parser.add_argument("path", type=Path, help="Directory or file to scan")
    parser.add_argument("--fail-on", choices=["high", "medium", "low"], default="high",
                        help="Exit non-zero if findings of this severity or above are found")
    parser.add_argument("--output", choices=["text", "json"], default="text")
    args = parser.parse_args()

    if args.path.is_file():
        findings = scan_file(args.path)
    else:
        findings = scan_directory(args.path)

    severity_rank = {"HIGH": 3, "MEDIUM": 2, "LOW": 1}
    fail_rank = severity_rank[args.fail_on.upper()]

    if args.output == "json":
        print(json.dumps([
            {
                "file": f.file, "line": f.line, "category": f.category,
                "severity": f.severity, "description": f.description,
                "remediation": f.remediation,
            }
            for f in findings
        ], indent=2))
    else:
        if not findings:
            print("No AI reimplementation patterns detected.")
        for f in findings:
            print(f"[{f.severity}] {f.file}:{f.line} — {f.category}")
            print(f"  {f.description}")
            print(f"  Remediation: {f.remediation}")
            print()

    should_fail = any(severity_rank[f.severity] >= fail_rank for f in findings)
    return 1 if should_fail else 0


if __name__ == "__main__":
    sys.exit(main())

This scanner is not a replacement for manual code review — it catches the obvious patterns. A finding on yaml.load without SafeLoader is reliable. A finding on random.choices(...) near the word token has a false-positive rate of roughly 15–20% in test codebases (test code generating random test tokens, simulation code, etc.), which is why the --fail-on threshold is configurable and why test directories are excluded. The appropriate posture is to treat this as a triage tool: HIGH findings block the build, MEDIUM findings generate review comments, LOW findings produce a report for the next security review cycle.

3. CI Pipeline Integration

The scanner runs alongside Grype in CI. They are complementary, not redundant: Grype covers declared dependencies in manifests; the pattern scanner covers inline implementations that have no manifest entry.

# .github/workflows/supply-chain-security.yml
name: Supply Chain Security

on:
  push:
    branches: [main, "release/**"]
  pull_request:

permissions: {}

jobs:
  dependency-scan:
    name: Declared Dependencies (Grype)
    runs-on: ubuntu-latest
    permissions:
      contents: read
      security-events: write
    steps:
      - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1

      - name: Grype scan — declared dependencies
        uses: anchore/scan-action@3343887d815d7b07465f6fdcd395bd66508d486a # v3.6.4
        with:
          path: "."
          fail-build: true
          severity-cutoff: high
          output-format: sarif

      - name: Upload Grype SARIF
        uses: github/codeql-action/upload-sarif@1b1aada464948af03b950897e5eb522f92603cc2 # v3.24.9
        with:
          sarif_file: results.sarif
          category: grype-dependencies

  inline-implementation-scan:
    name: Inline Reimplementations (AI Code Scanner)
    runs-on: ubuntu-latest
    permissions:
      contents: read
    steps:
      - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1

      - name: Set up Python
        uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9f # v5.0.0
        with:
          python-version: "3.12"

      - name: Run AI reimplementation scanner
        run: |
          python3 scripts/scan_ai_reimplementations.py src/ \
            --fail-on high \
            --output json | tee scan-results.json

          # Summary for PR annotations
          python3 - <<'PYEOF'
          import json, sys
          results = json.load(open("scan-results.json"))
          high = [r for r in results if r["severity"] == "HIGH"]
          medium = [r for r in results if r["severity"] == "MEDIUM"]
          print(f"HIGH: {len(high)}, MEDIUM: {len(medium)}")
          for r in high:
              print(f"::error file={r['file']},line={r['line']}::[{r['category']}] {r['description']}")
          for r in medium:
              print(f"::warning file={r['file']},line={r['line']}::[{r['category']}] {r['description']}")
          sys.exit(1 if high else 0)
          PYEOF

      - name: Upload scan results
        if: always()
        uses: actions/upload-artifact@5d5d22a31266ced268874388b861e4b58bb5c2f3 # v4.3.1
        with:
          name: ai-reimplementation-scan
          path: scan-results.json

  sbom-completeness:
    name: SBOM Generation and AI Component Inventory
    runs-on: ubuntu-latest
    needs: [dependency-scan]
    permissions:
      contents: read
    steps:
      - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
        with:
          fetch-depth: 0  # full history for git-log attribution

      - name: Generate SBOM from declared dependencies
        uses: anchore/sbom-action@ab5d7b5f48981941c4c5d6bf33aeb98fe3bae38c # v0.15.10
        with:
          path: "."
          output-file: sbom-declared.spdx.json
          format: spdx-json

      - name: Identify AI-attributed files
        run: |
          # Find files with AI tool attribution in commit messages
          # Covers: GitHub Copilot, Cursor, Claude, ChatGPT attribution markers
          git log --all --pretty=format:"%H %s %b" -- "*.py" "*.go" "*.ts" "*.js" | \
            grep -iE "co-authored-by.*(copilot|cursor|claude|chatgpt|codewhisperer|tabnine)" | \
            awk '{print $1}' | \
            sort -u > ai_attributed_commits.txt

          echo "Commits with AI attribution: $(wc -l < ai_attributed_commits.txt)"

          # For each attributed commit, list changed files
          while read sha; do
            git show --name-only --pretty="" "$sha"
          done < ai_attributed_commits.txt | sort -u > ai_attributed_files.txt

          echo "Files with AI attribution in commit history:"
          cat ai_attributed_files.txt

      - name: Annotate SBOM with AI component inventory
        run: |
          python3 - <<'PYEOF'
          import json
          from pathlib import Path

          sbom = json.loads(Path("sbom-declared.spdx.json").read_text())
          ai_files = Path("ai_attributed_files.txt").read_text().splitlines()

          # Add AI-attributed files as SBOM packages with AI origin annotation
          # SPDX 2.3 supports PackageComment and ExternalRef for this purpose
          ai_packages = []
          for i, filepath in enumerate(ai_files):
              if not Path(filepath).exists():
                  continue
              ai_packages.append({
                  "SPDXID": f"SPDXRef-AIGenerated-{i:04d}",
                  "name": filepath,
                  "versionInfo": "NOASSERTION",
                  "downloadLocation": "NOASSERTION",
                  "filesAnalyzed": False,
                  "comment": "AI-GENERATED: This file contains code generated by an AI coding assistant. "
                              "Security-relevant functions in this file are not covered by the declared "
                              "dependency CVE feed. Manual security review required.",
                  "primaryPackagePurpose": "SOURCE",
                  "supplier": "NOASSERTION",
              })

          sbom.setdefault("packages", []).extend(ai_packages)
          Path("sbom-with-ai.spdx.json").write_text(json.dumps(sbom, indent=2))
          print(f"Added {len(ai_packages)} AI-attributed file entries to SBOM")
          PYEOF

      - name: Upload enriched SBOM
        uses: actions/upload-artifact@5d5d22a31266ced268874388b861e4b58bb5c2f3 # v4.3.1
        with:
          name: sbom-with-ai-components
          path: sbom-with-ai.spdx.json

The SBOM enrichment step uses git commit metadata to find AI-attributed files. This approach has a gap: it only covers files where an AI attribution marker appears in the commit message. Code generated by AI tools that does not use a co-author attribution line — which includes most Copilot-generated code unless the team specifically adds it — will not appear. The git log attribution step is a starting point for tracking AI-generated files, not a complete inventory.

4. Code Review Checklist for AI-Generated Code

Checklists only function if they are enforced at merge time. The following is formatted for use as a required PR template section rather than as a suggestion.

<!-- .github/PULL_REQUEST_TEMPLATE.md — Security section -->

## Security Review (required for all PRs)

### AI-generated code check

Does this PR contain code generated by an AI coding tool
(Copilot, Cursor, Claude Code, ChatGPT, Codeium, etc.)? **[ ] Yes  [ ] No**

If yes, complete the following before requesting review:

**For each function that was AI-generated, check:**

- [ ] JWT/token parsing: uses PyJWT / golang-jwt / node-jsonwebtoken / jose
      (NOT manual base64 decode + HMAC verify)
- [ ] Password hashing: uses bcrypt / argon2-cffi / passlib
      (NOT hashlib.sha256/sha512 directly on password)
- [ ] Random generation for security use: uses secrets / crypto/rand / crypto.getRandomValues()
      (NOT random.random() / random.choice() / Math.random())
- [ ] YAML parsing: uses yaml.safe_load() / SafeLoader
      (NOT yaml.load() / yaml.FullLoader / yaml.UnsafeLoader)
- [ ] XML parsing: uses defusedxml
      (NOT xml.etree.ElementTree / xml.dom.minidom directly)
- [ ] Cryptographic operations: uses cryptography / OpenSSL / golang.org/x/crypto
      (NOT custom cipher/hash implementations)

**If AI-generated code in a security-critical category is approved for merge:**

- [ ] Added to `SECURITY-NOTES.md` with function name, location, date, reviewer
- [ ] Ticket created to replace with a library in the next quarter
- [ ] Security team sign-off obtained (tag @security-team in PR)

5. Retroactive Audit of Existing Codebases

The policy and CI controls address future code. Existing codebases accumulated AI-generated inline implementations before any policy existed. Running a retroactive audit before the next SBOM audit or security review is operationally important.

#!/bin/bash
# retroactive-ai-audit.sh
# Run against an existing codebase to surface likely AI-generated
# inline security implementations before a formal SBOM audit.
# Does not require AI attribution in git history.

set -euo pipefail
REPO_ROOT="${1:-.}"
REPORT_FILE="ai-audit-$(date +%Y%m%d).txt"

echo "=== AI Inline Implementation Audit: $(date) ===" | tee "$REPORT_FILE"
echo "Repository: $REPO_ROOT" | tee -a "$REPORT_FILE"
echo "" | tee -a "$REPORT_FILE"

# 1. JWT custom implementations: base64 splitting on dot separator
echo "--- JWT manual parsing patterns ---" | tee -a "$REPORT_FILE"
grep -rn --include="*.py" --include="*.go" --include="*.js" --include="*.ts" \
  -E 'b64decode.*split|split.*\\..*b64decode|header.*payload.*signature|\.split\("\."\).*\[0\].*\[1\]' \
  "$REPO_ROOT" \
  --exclude-dir="{vendor,node_modules,.venv,venv,test,tests,__pycache__}" \
  2>/dev/null | tee -a "$REPORT_FILE" || true

# 2. Unsafe YAML loading
echo "" | tee -a "$REPORT_FILE"
echo "--- YAML unsafe load patterns ---" | tee -a "$REPORT_FILE"
grep -rn --include="*.py" \
  -E 'yaml\.load\s*\(' \
  "$REPO_ROOT" \
  --exclude-dir="{vendor,.venv,venv,test,tests,__pycache__}" \
  2>/dev/null | grep -v 'safe_load\|SafeLoader' | tee -a "$REPORT_FILE" || true

# 3. Weak random in security contexts
echo "" | tee -a "$REPORT_FILE"
echo "--- Weak PRNG in security context ---" | tee -a "$REPORT_FILE"
grep -rn --include="*.py" \
  -E 'random\.(choice|choices|randint|randbytes|random)\(' \
  "$REPO_ROOT" \
  --exclude-dir="{vendor,.venv,venv,__pycache__}" \
  2>/dev/null | grep -iE 'token|secret|password|key|nonce|salt|session|csrf' \
  | tee -a "$REPORT_FILE" || true

# 4. Direct password hashing with SHA/MD5
echo "" | tee -a "$REPORT_FILE"
echo "--- Weak password hashing ---" | tee -a "$REPORT_FILE"
grep -rn --include="*.py" \
  -E 'hashlib\.(sha256|sha512|sha1|md5)\(' \
  "$REPO_ROOT" \
  --exclude-dir="{vendor,.venv,venv,test,tests,__pycache__}" \
  2>/dev/null | grep -iE 'password|passwd|pwd' | tee -a "$REPORT_FILE" || true

# 5. Custom Encrypt/Decrypt functions (potential inlined crypto)
echo "" | tee -a "$REPORT_FILE"
echo "--- Custom cryptographic function definitions ---" | tee -a "$REPORT_FILE"
grep -rn --include="*.py" --include="*.go" \
  -E '(def|func)\s+(encrypt|decrypt|sign|verify_signature|hmac_verify)\s*\(' \
  "$REPO_ROOT" \
  --exclude-dir="{vendor,.venv,venv,test,tests,__pycache__}" \
  2>/dev/null | tee -a "$REPORT_FILE" || true

# 6. XML parsing without defusedxml
echo "" | tee -a "$REPORT_FILE"
echo "--- XML parsing without defusedxml ---" | tee -a "$REPORT_FILE"
grep -rn --include="*.py" \
  -E 'import xml\.(etree|dom|sax)|from xml\.(etree|dom|sax)' \
  "$REPO_ROOT" \
  --exclude-dir="{vendor,.venv,venv,__pycache__}" \
  2>/dev/null | tee -a "$REPORT_FILE" || true

echo "" | tee -a "$REPORT_FILE"
echo "=== Audit complete. Review $REPORT_FILE ===" | tee -a "$REPORT_FILE"
echo "For each finding: determine if it is a reimplemented library function." | tee -a "$REPORT_FILE"
echo "If yes: create a ticket, add to SECURITY-NOTES.md, plan library migration." | tee -a "$REPORT_FILE"

Run this once against the main branch before the next SBOM submission or security audit. The output requires human triage — the grep patterns produce false positives against legitimate uses of the standard library. The goal is to produce a prioritised list of locations to review manually, not a definitive finding.

Expected Behaviour

After the CI pipeline is in place, a PR that adds a manual JWT verification function produces the following output from the inline implementation scanner:

[HIGH] src/auth/token_verify.py:47 — jwt-manual-parse
  Manual JWT base64 parsing detected. This pattern reimplements JWT decoding
  without the signature verification that PyJWT enforces by default.
  Remediation: Replace with PyJWT: `import jwt; payload = jwt.decode(token, key, algorithms=['RS256'])`

::error file=src/auth/token_verify.py,line=47::[jwt-manual-parse] Manual JWT base64 parsing detected...

The GitHub Actions annotation marks the specific line in the PR diff. The job exits non-zero. The PR cannot be merged until the finding is resolved — either by replacing the implementation with PyJWT, or by going through the documented exception process (security team sign-off, ticket for migration, SECURITY-NOTES.md entry).

The Grype scan running in parallel produces its own output against requirements.txt and finds nothing unusual, because the inline JWT implementation has no entry in requirements.txt. The two scans are genuinely complementary: Grype’s output is empty; the pattern scanner’s output is the actionable finding. Without both running together, the CI pipeline would have passed cleanly.

The enriched SBOM output includes entries like:

{
  "SPDXID": "SPDXRef-AIGenerated-0023",
  "name": "src/auth/token_verify.py",
  "versionInfo": "NOASSERTION",
  "downloadLocation": "NOASSERTION",
  "filesAnalyzed": false,
  "comment": "AI-GENERATED: This file contains code generated by an AI coding assistant. Security-relevant functions in this file are not covered by the declared dependency CVE feed. Manual security review required.",
  "primaryPackagePurpose": "SOURCE",
  "supplier": "NOASSERTION"
}

This SBOM entry does not resolve the gap — NOASSERTION throughout means there is nothing to scan — but it makes the gap visible. An SBOM consumer that receives this document knows that src/auth/token_verify.py contains AI-generated code with no CVE coverage. That is materially more information than the alternative, which is a clean SBOM with no indication that uninspected AI-generated security logic exists.

Trade-offs

The dependency-first policy creates friction that is sometimes justified. The security case for using PyJWT instead of a hand-generated JWT parser is clear. The security case for using tenacity instead of a three-line retry loop around a simple API call is less clear — the inlined version is narrow enough that its failure modes are visible and auditable at review time. Applying the dependency-first rule uniformly will generate pushback from engineers who correctly point out that a 200-line library is inappropriate for a two-line use case. The policy needs an explicit exception path that is documented and reviewed, not a blanket prohibition that gets ignored.

The pattern scanner false positive rate is real and must be managed. In a test codebase of ~40,000 lines of Python, the patterns above produce roughly 30% false positives at MEDIUM severity. At HIGH severity the false positive rate drops to around 10%, primarily from test fixtures that use random.choices() to generate test data. The right response is to suppress documented false positives with inline comments (# nosec: test-data-not-security-context) and treat unresolved HIGH findings as blocking, MEDIUM findings as requiring comments. Tuning the regex patterns for your specific codebase over the first few weeks of operation is expected.

SBOM enrichment using git log attribution is incomplete. Only code where an AI tool’s co-author attribution appears in the commit message is identified. Most Copilot-generated code, and most code written with inline completions from any tool, has no such attribution. The git log approach identifies the provable minimum; the actual volume of AI-generated code in a codebase that has used AI tools for 12+ months is substantially larger. The enriched SBOM is more accurate than the baseline, but it does not represent a complete inventory until engineering teams adopt consistent attribution practices.

LLM-generated code quality varies significantly by prompt and model. A prompt that asks for “a JWT parser” will receive different quality output from different models and different prompt formulations. Some of that output will be secure by coincidence (using hmac.compare_digest for timing-safe comparison), some will be subtly insecure (using == for signature comparison, enabling timing attacks). The scanner catches known-bad patterns; it does not certify that the absence of known-bad patterns means the implementation is correct. Security review of AI-generated security-sensitive code by an engineer who understands the threat model remains necessary.

Failure Modes

Assuming Grype or Snyk coverage extends to AI-generated inline code. This is the most common misconception. Grype scans SBOM data derived from declared dependency manifests. An AI-generated inline JWT parser has no manifest entry. Grype’s output for that codebase will be clean with respect to JWT vulnerabilities regardless of what the inline implementation does. Teams that believe “we run Grype in CI, we’re covered” are operating with a systematic blind spot for the fastest-growing category of new code in their codebase.

Generating SBOMs from package manifests only and treating them as complete. syft dir:. applied to a Python project will find requirements.txt, pyproject.toml, and setup.py and generate accurate SBOM entries for every declared dependency. It will not find the JWT verification function in src/auth/token.py that was generated by Cursor last month. An SBOM submitted to a customer or regulator that was generated this way is materially incomplete for any codebase with meaningful AI-assisted code. The completeness gap will not be visible in the SBOM document itself — it will look like a full SBOM.

Applying the new policy only to future code without retroactive audit. A team that implements the dependency-first policy and CI scanner in May 2026 will prevent new AI-generated inline implementations from accumulating. It will not surface the implementations that accumulated during the previous 18 months of AI-assisted development. The retroactive audit is operationally inconvenient — it takes several days of engineering time to run, triage, and create remediation tickets — but skipping it means the existing security debt is invisible. The next SBOM audit, penetration test, or regulatory inspection will surface it instead, under worse conditions.

Treating AI code attribution as equivalent to AI code review. Adding Co-authored-by: GitHub Copilot to a commit message creates an audit trail. It does not mean the code was reviewed for security implications. The audit trail is useful for identifying which files to prioritise in a retroactive review; it is not a substitute for that review. Teams that implement attribution without review have better inventory and the same vulnerability exposure.

Not tracking the exception backlog. The dependency-first policy creates a legitimate exception path for narrow, well-reviewed inline implementations. If that exception path is not tracked — no ticket, no scheduled migration, no quarterly review — exceptions accumulate into permanent technical debt. The SECURITY-NOTES.md file becomes a list of known-risk inline implementations that nobody has capacity to replace. Schedule a quarterly review of all open SECURITY-NOTES.md entries as a fixed team obligation, not a best-effort task.