Preventing Secret Exfiltration via AI Coding Tool Context Windows

Preventing Secret Exfiltration via AI Coding Tool Context Windows

Problem

AI coding assistants — GitHub Copilot, Cursor, Claude Code, Codeium, Amazon Q, and similar tools — provide value precisely because they understand the context of your codebase. To understand context, they read files from your working directory and include them in the LLM context window. This is the feature, not a bug. The security problem emerges when the working directory contains secrets.

What gets included in context. Modern AI coding tools read far more than just the file being edited. They typically include:

  • The currently open file and adjacent files
  • Files explicitly referenced or imported by the current file
  • Files matching patterns the tool considers relevant (configuration files, .env, *.yaml, *.json)
  • Files the developer explicitly asks the tool to read
  • In some tools (Claude Code, Cursor in agent mode), the entire directory tree

What secrets are typically present. A typical developer working directory may contain:

  • .env files with database connection strings, API keys, and service credentials
  • *.pem, *.key files for TLS certificates and SSH private keys
  • config.yaml / settings.json with embedded secrets
  • terraform.tfvars with cloud provider credentials
  • .netrc with authentication tokens
  • kubeconfig with cluster credentials
  • AWS credentials file or .boto with access keys

The transmission question. When these files are included in the LLM context, their content is transmitted to the AI provider’s API. For cloud-hosted AI tools (Copilot, Claude Code, Cursor with cloud models), this means the secret is sent over HTTPS to the provider’s infrastructure. For providers with data-at-rest retention policies, the secret may be stored temporarily in request logs or training data pipelines.

The risk varies by threat model. The primary risk is not deliberate exfiltration by the AI provider — it is:

  1. Accidental inclusion: the secret appears in a context that gets transmitted, logged, or cached in ways the developer did not intend
  2. AI output disclosure: the AI includes the secret value in a code completion or explanation, pasting it into chat history, code files, or commit history
  3. Cross-session contamination: some tools cache context between sessions; a secret loaded in one session might appear in completions for an unrelated task

Target systems: any organisation where developers use AI coding assistants with access to production credentials, cloud provider keys, or secrets in their working directory; DevSecOps teams defining acceptable use policies for AI coding tools.


Threat Model

Adversary 1 — AI tool includes secret in code completion. A developer has a .env file with DATABASE_URL=postgres://admin:prod_password@db.internal/myapp. They ask the AI tool to “write a test that connects to the database.” The AI generates test code that hardcodes the connection string from context. The developer merges the test without reviewing it. The production password is now in the git history.

Adversary 2 — Cloud credentials transmitted to AI provider. A developer’s working directory includes ~/.aws/credentials (symlinked or copied) or a terraform.tfvars with AWS access keys. The AI tool reads the directory for context. The credentials are transmitted to the AI provider’s API in the request payload. The developer is unaware this occurred.

Adversary 3 — Secret surfaced in chat history. A developer asks the AI “what’s the API key being used in this project?” The AI, having read .env from context, returns the actual API key value in its response. The chat history (often stored by the tool provider) now contains the secret in plaintext.


Configuration / Implementation

Step 1 — Configure .aiderignore / .copilotignore / tool-specific exclusions

Most AI coding tools support an ignore file that prevents specified paths from being read:

# .cursorignore — Cursor IDE
# .aiderignore — Aider
# Similar to .gitignore but for AI context

# NEVER include these in AI context
.env
.env.*
.env.local
.env.production
.env.staging

# Private keys and certificates
*.pem
*.key
*.p12
*.pfx
*.jks
id_rsa
id_ed25519
*.private

# Cloud credentials
.aws/credentials
.aws/config
credentials.json
service-account*.json
*.tfvars
*.tfvars.json
terraform.tfstate
terraform.tfstate.backup

# Kubernetes credentials
kubeconfig
*.kubeconfig
.kube/config

# Other credential files
.netrc
.npmrc
.pypirc
.gitconfig

# Secrets in config files (use more specific patterns for your stack)
config/secrets.yml
config/credentials.yml.enc
config/master.key
// .vscode/settings.json — controls Copilot context (GitHub Copilot)
{
  "github.copilot.editor.enableCodeActions": true,
  "github.copilot.enable": {
    "*": true,
    ".env": false,
    "*.key": false,
    "*.pem": false
  }
}

Step 2 — Pre-flight secret detection before AI tool sessions

#!/bin/bash
# scripts/ai-workspace-secret-check.sh
# Run before starting an AI coding session in a directory
# Warns about secrets that AI tools might include in context

WORKSPACE="${1:-.}"

echo "=== AI Workspace Secret Check ==="
echo "Scanning: $WORKSPACE"
echo ""

FINDINGS=0

# Check for .env files with actual values
find "$WORKSPACE" -name ".env*" -not -path "*/.git/*" 2>/dev/null | \
while read -r env_file; do
    # Check if file has assignment patterns (not just a template)
    if grep -qE "^[A-Z_]+=.{3,}" "$env_file" 2>/dev/null; then
        echo "WARNING: $env_file contains values (may include secrets)"
        FINDINGS=$((FINDINGS + 1))
    fi
done

# Check for private key files
find "$WORKSPACE" -name "*.pem" -o -name "*.key" -o -name "id_rsa" \
    -o -name "id_ed25519" -not -path "*/.git/*" 2>/dev/null | \
while read -r key_file; do
    if grep -q "PRIVATE KEY\|PRIVATE\|BEGIN RSA" "$key_file" 2>/dev/null; then
        echo "WARNING: $key_file appears to be a private key"
        FINDINGS=$((FINDINGS + 1))
    fi
done

# Check for hardcoded secrets using trufflehog or gitleaks
if command -v gitleaks &>/dev/null; then
    echo "--- Running gitleaks secret scan ---"
    gitleaks detect --source="$WORKSPACE" --no-git --exit-code 0 \
        --report-format json 2>/dev/null | \
        jq -r '.[] | "SECRET: \(.RuleID) in \(.File):\(.StartLine)"' 2>/dev/null
elif command -v trufflehog &>/dev/null; then
    echo "--- Running trufflehog secret scan ---"
    trufflehog filesystem "$WORKSPACE" --no-verification 2>/dev/null | head -20
else
    echo "Note: Install gitleaks or trufflehog for deeper secret detection"
    # Basic pattern matching as fallback
    grep -r --include="*.env" --include="*.json" --include="*.yaml" \
        -E "(password|secret|key|token|credential)\s*[:=]\s*['\"]?[A-Za-z0-9/+=]{8,}" \
        "$WORKSPACE" 2>/dev/null | grep -v "template\|example\|sample\|test\|CHANGEME\|placeholder" | \
        grep -v ".git/" | head -20
fi

echo ""
echo "=== Recommendations ==="
echo "1. Move secrets to a password manager or secrets manager"
echo "2. Use environment variable references in config (not literal values)"
echo "3. Add secret files to .cursorignore / .aiderignore"
echo "4. Use 'direnv' with per-directory .envrc that loads from vault"

Step 3 — Use workspace isolation for sensitive projects

# Separate working directory structure to isolate secrets from AI tool access

# Pattern: keep secrets outside the project directory
PROJECT_DIR=~/projects/myapp
SECRETS_DIR=~/secrets/myapp  # Outside the project; AI tools won't reach it

mkdir -p "$SECRETS_DIR"
chmod 700 "$SECRETS_DIR"

# Store actual secrets outside the project
cat > "$SECRETS_DIR/.env.production" << 'EOF'
DATABASE_URL=postgres://user:actual_password@db.internal/myapp
API_KEY=actual_api_key_here
EOF
chmod 600 "$SECRETS_DIR/.env.production"

# In the project, keep only references (not values)
cat > "$PROJECT_DIR/.env.example" << 'EOF'
DATABASE_URL=postgres://user:CHANGEME@localhost/myapp
API_KEY=CHANGEME
EOF

# Use direnv to load secrets from outside the project on shell entry
cat > "$PROJECT_DIR/.envrc" << 'EOF'
# Load secrets from outside the project directory
# This file is safe to commit — it contains no actual secrets
dotenv "$HOME/secrets/myapp/.env.production"
EOF
# direnv allow .

Step 4 — Claude Code specific controls

# Claude Code reads files via explicit tool calls — configure what it can access

# .claude/settings.json (project-level Claude Code settings)
# Note: Claude Code respects .gitignore by default
# Add additional exclusions for files gitignore might not cover:

cat > .claude/settings.json << 'EOF'
{
  "permissions": {
    "allow": [
      "Read(**)",
      "Write(src/**)",
      "Bash(git *)",
      "Bash(npm *)"
    ],
    "deny": [
      "Read(.env*)",
      "Read(*.pem)",
      "Read(*.key)",
      "Read(*.tfvars)",
      "Read(terraform.tfstate*)",
      "Read(.aws/*)",
      "Read(.kube/*)",
      "Bash(curl *)",
      "Bash(wget *)"
    ]
  }
}
EOF

# Verify Claude Code respects the settings by checking which files it reads
# (Claude Code logs tool calls during a session)

Step 5 — Organisation policy for AI coding tools

# ai-coding-tools-policy.yaml
# Organisation policy for safe use of AI coding tools

ai_coding_tool_policy:
  version: "1.0"
  
  prohibited_in_ai_context:
    - "Production secrets, credentials, and API keys"
    - "Private TLS/SSH keys"
    - "Cloud provider access keys and service account credentials"
    - "Database connection strings with passwords"
    - "Kubernetes cluster credentials"
    - "Third-party API keys for production services"
  
  required_controls:
    developers:
      - "Maintain .cursorignore / .aiderignore with secret file patterns"
      - "Run workspace secret check before starting AI tool session"
      - "Use direnv or equivalent to load secrets from outside project directory"
      - "Never paste secrets directly into AI chat interfaces"
      - "Review all AI-generated code for accidental secret inclusion before committing"
    
    platform_team:
      - "Configure gitleaks pre-commit hooks to catch AI-generated secrets in commits"
      - "Enable GitHub secret scanning on all repositories"
      - "Provide a secrets manager workflow that keeps secrets out of project directories"
      - "Publish the .aiderignore template as part of project templates"
  
  approved_patterns:
    - name: "Environment variable references in code"
      example: "os.environ.get('DATABASE_URL')"
      safe: true
    
    - name: "Secret manager lookups in code"
      example: "aws_secrets_manager.get_secret_value(SecretId='prod/db/password')"
      safe: true
    
    - name: "Placeholder values in .env.example"
      example: "DATABASE_URL=postgres://user:CHANGEME@localhost/myapp"
      safe: true
  
  prohibited_patterns:
    - name: "Literal secrets in source files"
      example: "DATABASE_URL=postgres://user:actualpassword@db.internal/myapp"
      safe: false
    
    - name: "Asking AI to 'complete' code that references a secret variable"
      description: "AI may include the actual value from context in its completion"
      safe: false

Step 6 — Post-session audit for secret disclosure in output

#!/bin/bash
# scripts/ai-output-secret-scan.sh
# Scan recently modified files for secrets that AI may have inserted

MODIFIED_SINCE="${1:-1 hour ago}"

echo "=== Scanning AI session output for secrets ==="
echo "Checking files modified since: $MODIFIED_SINCE"

# Find recently modified source files
find . -newer <(date -d "$MODIFIED_SINCE" +%Y%m%d%H%M%S 2>/dev/null || \
    date -v-1H +%Y%m%d%H%M%S 2>/dev/null) \
    -type f \
    -not -path "./.git/*" \
    -not -name "*.pyc" \
    2>/dev/null | \
while read -r file; do
    # Check for patterns that look like secrets in recently modified files
    if grep -qEi "(password|secret|api.?key|access.?key|private.?key)\s*[:=]\s*['\"]?[A-Za-z0-9/+=]{8,}" "$file" 2>/dev/null; then
        echo "POTENTIAL SECRET in recently modified file: $file"
        grep -nEi "(password|secret|api.?key|access.?key|private.?key)\s*[:=]\s*['\"]?[A-Za-z0-9/+=]{8,}" "$file" 2>/dev/null | head -5
    fi
done

echo ""
echo "Run 'git diff' to review all changes before committing"

Expected Behaviour

Scenario Without controls With controls
AI reads .env file with production DB password Password transmitted to AI provider .env excluded via .aiderignore; AI cannot read it
AI generates code completion referencing a secret Actual secret value appears in completion Secret not in context; AI uses placeholder or environment variable reference
Developer asks AI “what’s the API key?” AI returns actual key from context AI has no access to key file; cannot return the value
Secret appears in AI-generated test code Secret hardcoded in test; committed to git Pre-commit gitleaks hook blocks commit; developer must replace with env var reference
Workspace scan before AI session No pre-flight check; developer unaware of secrets present Scan lists all secret files in workspace; developer remediates before session

Trade-offs

Aspect Benefit Cost Mitigation
Broad ignore patterns Easy to apply; covers most cases May exclude files the AI legitimately needs Start broad; whitelist specific non-secret config files as needed
Workspace isolation (secrets outside project) Physical separation prevents accidental inclusion Developers must manage two directories; friction Use direnv to make secrets transparently available via env vars; project directory stays clean
Pre-commit secret scanning Catches AI-generated secrets before they reach git Adds latency to commits gitleaks is fast; accept the 1-2 second overhead
Local-only AI tools (Ollama, llama.cpp) No transmission to external provider Model capability may be lower than hosted tools Evaluate model quality for your use case; may be acceptable for routine tasks

Failure Modes

Failure Symptom Detection Recovery
.aiderignore not respected by tool Secrets still included in context Tool reads secret file (visible in tool’s file list) Check tool-specific documentation; may require different config file name or format
Secret committed via AI-generated code Secret in git history GitHub secret scanning alert; post-commit gitleaks Rotate the secret immediately; use git-filter-repo to remove from history
Developer bypasses pre-commit hook Unscanned commit reaches remote GitHub-side secret scanning catches it Rotate secret; investigate why hook was bypassed; enforce hook with CI check
AI tool updates and ignores .aiderignore New version reads previously excluded files Session audit scan finds secrets in AI output Pin tool version; test new versions before updating across org