Gating AI-Generated Security Fixes Before Merge

Gating AI-Generated Security Fixes Before Merge

Problem

AI-powered security fix tools have become mainstream in the last two years. GitHub Copilot Autofix, CodeQL’s AI fix suggestions, Snyk’s DeepCode AI, and similar products can now generate pull requests that patch detected vulnerabilities automatically. The value proposition is compelling: a SAST scanner finds a SQL injection vulnerability and, instead of creating a ticket that sits in a backlog, it immediately opens a PR with the fix.

The problem is that AI-generated security fixes are correct often enough to build trust, but wrong often enough to be dangerous — and the failure modes are more subtle than simply not fixing the original vulnerability.

The fix doesn’t address the root cause. An AI-generated fix for a SQL injection that escapes a specific string input may not address an identical pattern in a related function, or may not identify that the root cause is the use of string concatenation in a shared utility. The scanner closes the finding; the vulnerability is still in production via a different code path.

The fix introduces a new vulnerability. Sanitising input in one place while adding an unchecked new code path is a documented failure mode for AI security fixes. Several published examples show AI-generated fixes that resolved an XSS in one context while introducing an open redirect or CSRF weakness in the redirect handling added by the fix.

The fix breaks application logic. Security fixes frequently require changes that affect business logic — adding validation that rejects previously valid inputs, changing encoding that affects downstream processing, adding authentication checks that break an integration. AI tools don’t have a model of the application’s semantic behaviour; they apply pattern-based fixes that may be syntactically correct but functionally wrong.

The fix depends on an introduced library or pattern with its own risk. An AI that fixes a cryptographic weakness by introducing a dependency on a new library has added a supply chain risk in the process of removing the original risk.

Auto-merge amplifies all of the above. Some CI/CD configurations automatically merge Dependabot or autofix PRs that pass CI checks. If the autofix is subtly wrong in a way that CI tests don’t catch, it reaches production without human review.

The correct posture is not to reject AI-generated security fixes — they are often correct and they accelerate remediation significantly. The correct posture is to treat them as requiring the same (or greater) review rigour as any other security-sensitive code change, with automated validation that goes beyond “does CI pass?”.

Target systems: any repository using GitHub Advanced Security with Copilot Autofix, CodeQL fix suggestions, Snyk PR bot, or Dependabot with auto-merge; any CI/CD pipeline where security findings auto-generate PRs.


Threat Model

Adversary 1 — Incomplete fix exploited. An AI autofix partially remediates a SQL injection — escaping one input but missing a parallel code path. A security researcher discovers the remaining code path after the fix is merged, and the team incorrectly believes the class of vulnerability has been closed.

Adversary 2 — Fix-introduced supply chain risk. An AI autofix introduces a new npm or PyPI dependency to resolve a cryptographic weakness. The introduced dependency has a known vulnerability or is a typosquatted package. The autofix PR passes CI, the dependency is installed in production.

Adversary 3 — Auto-merge creates silent regression. An autofix PR automatically merges because it passes CI. The fix breaks an undocumented API contract. A downstream service starts failing in production; the failure is not linked to the autofix because the regression is non-obvious.

Adversary 4 — AI fix for injection creates second-order injection. An AI fixes a reflected XSS by HTML-encoding output in one template. The fix adds a URL parameter that is URL-decoded later in processing, creating a second-order injection at the new decode point.

Without gates: autofixes reach production with the above risks unexamined. With gates: mandatory validation steps catch incomplete fixes, introduced dependencies, and logic regressions before merge.


Configuration / Implementation

Step 1 — Label and track autofix PRs distinctly

# .github/workflows/label-autofix-prs.yml
name: Label Autofix PRs

on:
  pull_request:
    types: [opened, synchronize]

permissions:
  pull-requests: write

jobs:
  label:
    runs-on: ubuntu-latest
    steps:
    - name: Label AI-generated security fix PRs
      uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea
      with:
        script: |
          const pr = context.payload.pull_request;
          const isAutofix = 
            pr.user.login === 'github-advanced-security[bot]' ||
            pr.user.login === 'snyk-bot' ||
            pr.user.login === 'copilot-swe-agent[bot]' ||
            pr.title.match(/\[Autofix\]|\[CodeQL\]|\[Snyk\]/i);
          
          if (isAutofix) {
            await github.rest.issues.addLabels({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: pr.number,
              labels: ['ai-security-fix', 'requires-security-review']
            });
            
            await github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: pr.number,
              body: `## AI-Generated Security Fix Review Required
              
This PR was generated by an automated security fix tool. Before merging:
- [ ] Verify the fix addresses the root cause, not just the reported instance
- [ ] Check for any new dependencies introduced
- [ ] Confirm the fix does not break related functionality
- [ ] Run the security-specific test suite (if present)
- [ ] Scan the changed files for the original vulnerability pattern to verify completeness

See: [AI Autofix Review Checklist](https://docs.internal/security/ai-autofix-review)`
            });
          }

Step 2 — Block auto-merge for AI security fix PRs

# Branch protection rule via GitHub API — require human review for autofix PRs

# .github/CODEOWNERS — require security team review for AI fix files
# This applies to all PRs, including autofix ones
* @your-org/security-team  # All changes require security team review

Alternatively, use a dedicated required status check:

# .github/workflows/ai-fix-gate.yml
name: AI Security Fix Gate

on:
  pull_request:
    types: [opened, synchronize, labeled]

permissions:
  contents: read
  pull-requests: read

jobs:
  require-human-review:
    if: contains(github.event.pull_request.labels.*.name, 'ai-security-fix')
    runs-on: ubuntu-latest
    steps:
    - name: Check for human security review approval
      uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea
      with:
        script: |
          const reviews = await github.rest.pulls.listReviews({
            owner: context.repo.owner,
            repo: context.repo.repo,
            pull_number: context.payload.pull_request.number
          });
          
          const securityTeamMembers = ['security-eng-1', 'security-eng-2', 'security-lead'];
          
          const hasSecurityApproval = reviews.data.some(r => 
            r.state === 'APPROVED' && 
            securityTeamMembers.includes(r.user.login)
          );
          
          if (!hasSecurityApproval) {
            core.setFailed(
              'AI-generated security fixes require approval from a security team member. ' +
              'The automated check is not sufficient for security fixes.'
            );
          }

Step 3 — Scan the fix itself for new vulnerabilities

Run security scanning on the diff introduced by the autofix:

# .github/workflows/scan-autofix-changes.yml
name: Scan AI Fix for Introduced Issues

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  scan-introduced-deps:
    if: contains(github.event.pull_request.labels.*.name, 'ai-security-fix')
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683

    - name: Check for newly introduced dependencies
      run: |
        git diff origin/main...HEAD -- package.json package-lock.json \
          requirements.txt Pipfile.lock pom.xml build.gradle Cargo.toml \
          go.sum go.mod 2>/dev/null | grep '^+' | grep -v '^+++' | \
          grep -E '"[a-z]|^[a-z]' > /tmp/new-deps.txt || true
        
        if [[ -s /tmp/new-deps.txt ]]; then
          echo "::warning::AI fix introduced new dependencies — review for supply chain risk:"
          cat /tmp/new-deps.txt
          echo "new_deps=true" >> $GITHUB_ENV
        fi

    - name: Scan new dependencies for known vulnerabilities
      if: env.new_deps == 'true'
      run: |
        # Run your SCA tool on the updated dependency files
        # Example with npm audit:
        npm audit --audit-level moderate 2>&1 | tee /tmp/audit-results.txt
        
        if grep -q "high\|critical" /tmp/audit-results.txt; then
          echo "::error::AI autofix introduced a dependency with high/critical vulnerabilities"
          cat /tmp/audit-results.txt
          exit 1
        fi

    - name: Re-scan changed files for original vulnerability pattern
      run: |
        # Get list of files changed by the autofix
        git diff --name-only origin/main...HEAD > /tmp/changed-files.txt
        
        # Re-run CodeQL or your SAST tool on just the changed files
        # This verifies the fix resolved the finding completely
        echo "Files changed by autofix:"
        cat /tmp/changed-files.txt
        
        # If the original finding was SQL injection, scan for SQL injection patterns
        # in the changed files using semgrep
        if [[ -f .semgrep-rules.yml ]]; then
          semgrep --config .semgrep-rules.yml \
            $(cat /tmp/changed-files.txt | tr '\n' ' ') \
            --output /tmp/semgrep-results.json \
            --json 2>/dev/null || true
          
          FINDINGS=$(jq '.results | length' /tmp/semgrep-results.json)
          if [[ "$FINDINGS" -gt 0 ]]; then
            echo "::warning::Semgrep found $FINDINGS potential issues in the autofix changeset — human review required"
            jq '.results[] | {path: .path, rule: .check_id, line: .start.line}' /tmp/semgrep-results.json
          fi
        fi

Step 4 — Verify fix completeness with targeted re-scanning

# verify-autofix-completeness.py
# Verify that an AI security fix addresses the complete finding class,
# not just the reported instance

import anthropic
import subprocess
import json
from pathlib import Path

client = anthropic.Anthropic()

def verify_fix_completeness(
    finding_description: str,
    changed_files: list[str],
    repo_path: str
) -> dict:
    """
    Use AI to verify that the autofix addresses the complete vulnerability class,
    not just the reported instance.
    """
    
    # Get the diff for changed files
    diff_output = subprocess.run(
        ["git", "diff", "origin/main...HEAD", "--"] + changed_files,
        capture_output=True, text=True, cwd=repo_path
    ).stdout
    
    # Also search for related patterns in unchanged files
    pattern_search = subprocess.run(
        ["grep", "-rn", "--include=*.py", "--include=*.js", "--include=*.ts",
         # Pattern derived from the finding — adjust per vulnerability type
         "execute\|query\|cursor\|raw\|format"],
        capture_output=True, text=True, cwd=repo_path
    ).stdout
    
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2000,
        system="""You are a security code reviewer verifying that an AI-generated 
        security fix is complete. Your job is to find cases where:
        1. The fix addresses the reported instance but similar patterns remain elsewhere
        2. The fix introduces new code paths with similar vulnerabilities
        3. The fix patches symptoms but not the root cause
        
        Be specific about file paths and line numbers when you identify gaps.""",
        messages=[{
            "role": "user",
            "content": f"""
Original security finding:
{finding_description}

AI-generated fix (diff):
{diff_output[:5000]}

Grep for related patterns in repository (sample):
{pattern_search[:3000]}

Questions to answer:
1. Does the fix address the root cause or just the symptom?
2. Are there similar patterns in unchanged files that are also vulnerable?
3. Did the fix introduce any new code paths that have similar weaknesses?
4. Is the fix complete (nothing left vulnerable) or partial?

Answer YES/NO/PARTIAL to "Is this fix complete?" and explain why.
"""
        }]
    ).content[0].text
    
    return {
        "finding": finding_description,
        "changed_files": changed_files,
        "completeness_analysis": response,
        "requires_manual_review": "PARTIAL" in response.upper() or "NO" in response.upper()
    }

Step 5 — Security-specific test suite requirement

# Require that AI security fix PRs either:
# a) Pass existing security tests, or
# b) Include new tests that cover the vulnerability class

    - name: Verify security tests exist or were added
      run: |
        # Check if there are security-focused tests in the repo
        SECURITY_TESTS=$(find . -name "test_security_*" -o -name "*_security_test*" \
          -o -name "test_*injection*" -o -name "test_*xss*" -o -name "test_*sqli*" \
          2>/dev/null | wc -l)
        
        # Check if the autofix PR added or modified any tests
        CHANGED_TESTS=$(git diff --name-only origin/main...HEAD | \
          grep -E "test_|_test\." | wc -l)
        
        if [[ "$SECURITY_TESTS" -eq 0 && "$CHANGED_TESTS" -eq 0 ]]; then
          echo "::warning::AI security fix does not include or reference security tests"
          echo "Consider adding a regression test that verifies the vulnerability is fixed"
          # Warning, not failure — not all fixes can be easily tested
        else
          echo "Security tests present: $SECURITY_TESTS existing, $CHANGED_TESTS modified"
        fi

Expected Behaviour

Gate Without gating With gating
AI autofix PR merged automatically Yes (if CI passes) No — requires security team approval
New dependency introduced by fix Not scanned Scanned for CVEs; PR blocked if high/critical found
Fix addresses only one instance of a 5-instance pattern Merged; 4 remain vulnerable Completeness analysis flags partial fix
Fix introduces second-order vulnerability Merges without detection Re-scan of changed files flags new pattern
Autofix PR without security review Merges like any PR Blocked by required status check until security team approves

Trade-offs

Aspect Benefit Cost Mitigation
Required security review for all autofixes Human judgment on every AI fix Slows remediation; security team bottleneck Tier by severity: critical findings require security team; medium/low require dev team lead; automate triage
Completeness analysis via LLM Catches partial fixes Adds ~30s to CI; another LLM call that could hallucinate Treat as advisory, not blocking; human reviewer makes final call
Re-scan of changed files Catches introduced issues May miss the fix itself being flagged if SAST is aggressive Tune SAST rules to ignore known-fixed patterns; use rule suppressions with ticket references
Blocking auto-merge for AI fixes Eliminates silent bad fix in production Removes the zero-friction benefit of autofix Keep the value: AI generates the fix; humans review it; merge is still faster than a manual fix

Failure Modes

Failure Symptom Detection Recovery
Security team bottleneck delays critical fix Critical vulnerability unpatched for days waiting for review SLA breach on security fix tickets; security team queue depth Define time-bounded review SLA; escalate if not reviewed within 24h for critical
Completeness check hallucinates a false gap Reviewer investigates non-existent vulnerability based on AI analysis Manual code inspection finds no issue matching AI’s claim Treat completeness analysis as a hint, not a verdict; human reviewer has final say
CI passes but fix regresses functionality Post-merge functionality failure; user-facing bug Post-deploy monitoring catches regression; user reports Add functional test suite to required CI checks; never allow autofix to merge with CI failures
Autofix bot account used in supply chain attack Attacker compromises autofix bot; opens malicious PRs Malicious PR from bot account; code changes unrelated to finding Apply same supply chain controls to bot accounts as to human accounts; review bot’s IAM scope