API Key Lifecycle at Scale: Issuance, Rotation, Scoping, and Audit Across Cloud and SaaS

API Key Lifecycle at Scale: Issuance, Rotation, Scoping, and Audit Across Cloud and SaaS

Problem

API keys leak. The 2024 GitGuardian “State of Secrets Sprawl” report found 23+ million secrets exposed in public GitHub commits over the year — a substantial fraction of those API keys for production systems. The 2025 figure was higher.

Most API-key-leak incidents share a shape: a key was issued for a one-off purpose, never rotated, never reviewed, eventually committed to a public repo or copied to an attacker via some other vector. The leak isn’t surprising; the cleanup is hard because nobody knows what the key was for or what to revoke.

By 2026 mature programs treat API keys as tracked, lifecycle-managed objects:

  • Issued with explicit purpose, owner, and scope.
  • Used with auditable per-call attribution.
  • Rotated on schedule and on event (employee departure, suspected compromise).
  • Revoked automatically at end-of-life or on detection.
  • Discoverable — every key has a system-of-record entry showing where it’s used.

The scale problem: a typical org has 10,000-100,000 API keys across cloud providers (AWS, GCP, Azure), SaaS (Stripe, Twilio, Datadog, GitHub, OpenAI), internal services, and legacy systems. Per-key manual lifecycle is impossible.

The specific gaps in unmanaged programs:

  • New keys minted via UI clicks; no record of purpose.
  • Keys live in .env files, committed to repos, screenshotted in Slack.
  • Rotation never happens because “this key is in production, can’t risk breaking it.”
  • Departed employees’ keys persist for months / years.
  • No audit trail of which key did what — provider logs show “key X did Y” but X has no description.
  • Scope often defaults to admin / cluster-wide / unrestricted.

This article covers the lifecycle stages, secret-scanning prevention, programmatic issuance with a system-of-record, automated rotation, and the operational integration with employee onboarding / offboarding.

Target systems: Vault, AWS IAM Identity Center, GCP IAM, Azure Entra, GitGuardian / TruffleHog for scanning, Akeyless / Doppler / Infisical / 1Password for secret management.

Threat Model

  • Adversary 1 — Public-repo leak: an engineer commits an API key to a public repo; an attacker scans GitHub and uses it.
  • Adversary 2 — Insider threat: an employee with key access uses it for unauthorized purpose; key has no per-employee attribution.
  • Adversary 3 — Endpoint compromise: malware exfiltrates .env files or developer credentials.
  • Adversary 4 — SaaS provider compromise: a SaaS provider is breached; their copy of your API key for them is exposed.
  • Adversary 5 — Departed employee: former employee retains keys they personally created; uses them after leaving.
  • Access level: Adversaries 1, 3, 5 have laptop / repo access. Adversaries 2, 4 are insider / vendor-side.
  • Objective: Use the API key to impersonate the legitimate owner; read or modify data; pivot to other systems.
  • Blast radius: an unmanaged key with broad scope = full access to the resource indefinitely. With lifecycle management: bounded scope, bounded lifetime, attribution per use.

Configuration

Step 1: System of Record

Every key in your environment has an entry in a central registry.

# api-key-registry/keys/payments-stripe-live.yaml
key_id: payments-stripe-live-001
description: "Stripe live API key for payments service"
owner_team: payments
owner_individual: alice@example.com
created_at: 2026-04-01T10:00:00Z
expiration: 2027-04-01T10:00:00Z
last_rotated: 2026-04-01T10:00:00Z
scope:
  provider: stripe
  account_id: acct_xxx
  permissions: ["charges:write", "customers:read"]
  webhook_endpoints: ["https://api.example.com/stripe-webhook"]
storage:
  vault_path: "secret/payments/stripe-live"
  consumed_by:
    - kubernetes_secret: "payments-stripe-credentials"
      namespace: payments
audit:
  rotation_owner: payments-team
  rotation_frequency_days: 365
  last_audit: 2026-04-01

Every API key has a YAML in this registry. CI validates that:

  • Every key has an owner (team + individual).
  • Every key has an expiration < 1 year.
  • Every key has a documented purpose and scope.
  • The Vault path actually exists.

Without a registry entry, the key shouldn’t exist. Onboarding a new key without registry creation is a policy violation.

Step 2: Programmatic Issuance

When teams need a new API key, the workflow goes through automation, not the SaaS UI:

# issue_key.py
import argparse, yaml, requests, datetime

def issue_stripe_key(team, purpose, scope):
    # Generate key in Stripe via API.
    response = requests.post(
        f"https://api.stripe.com/v1/api_keys",
        auth=(STRIPE_PROVISIONING_KEY, ""),
        data={
            "name": f"{team}-{purpose}-{datetime.date.today()}",
            "role": scope,   # restricted role
        },
    )
    new_key = response.json()
    key_id = new_key["id"]
    secret = new_key["secret"]

    # Write to Vault.
    vault_path = f"secret/{team}/stripe-{purpose}"
    requests.post(
        f"{VAULT_ADDR}/v1/{vault_path}",
        headers={"X-Vault-Token": VAULT_TOKEN},
        json={"data": {"key": secret, "key_id": key_id}},
    )

    # Register in system-of-record.
    registry_entry = {
        "key_id": key_id,
        "description": purpose,
        "owner_team": team,
        "created_at": datetime.datetime.now(datetime.timezone.utc).isoformat(),
        "expiration": (datetime.datetime.now() + datetime.timedelta(days=365)).isoformat(),
        "scope": scope,
        "storage": {"vault_path": vault_path},
    }
    register_in_git(registry_entry)

if __name__ == "__main__":
    # CLI for engineers to request a new key.
    ...

The flow: engineer runs ./issue_key.py --team payments --purpose webhooks --scope charges:write. Issuance, storage, and registry update happen atomically. No clicking in the Stripe UI.

Step 3: Rotation Automation

Every key rotates on schedule. A controller monitors the registry:

# rotation_controller.py
import datetime, yaml, glob

def find_due_for_rotation():
    due = []
    for path in glob.glob("api-key-registry/keys/*.yaml"):
        entry = yaml.safe_load(open(path))
        last_rot = datetime.datetime.fromisoformat(entry["last_rotated"])
        freq = entry["audit"]["rotation_frequency_days"]
        if (datetime.datetime.now() - last_rot).days >= freq:
            due.append(entry)
    return due

def rotate_key(entry):
    # Mint new key.
    new_secret = provision_new_key(entry)
    # Write to Vault.
    write_to_vault(entry["storage"]["vault_path"], new_secret)
    # Wait for consumers to pick up (via External Secrets Operator).
    wait_for_consumers(entry, timeout_minutes=15)
    # Verify new key works (via a synthetic test).
    if not synthetic_test_passes(new_secret, entry):
        raise RotationError("Synthetic test failed; aborting rotation")
    # Revoke old key.
    revoke_old_key(entry["key_id"])
    # Update registry.
    entry["last_rotated"] = datetime.datetime.now(datetime.timezone.utc).isoformat()
    update_registry(entry)

# Run nightly via cron.
for entry in find_due_for_rotation():
    rotate_key(entry)

The flow handles dual-write windows: new key written, consumers refresh (via External Secrets Operator + reload), old key revoked once consumers are confirmed using the new.

For SaaS providers without an issuance API, fall back to manual rotation with structured tickets:

# rotation-ticket-template.md (Jira / Linear / GitHub Issue).
title: "Rotate {key_id} (due {due_date})"
body: |
  Key {key_id} is due for rotation by {due_date}.
  Owner: {owner_individual} / {owner_team}
  Provider: {provider}
  Vault path: {vault_path}

  Steps:
  - [ ] Mint new key in {provider} UI
  - [ ] Update Vault path
  - [ ] Confirm consumers using new key (synthetic test)
  - [ ] Revoke old key in {provider} UI
  - [ ] Update registry's last_rotated timestamp

Manual rotation tickets are tracked in your normal ticketing system; SLA enforced by engineering management.

Step 4: Pre-Commit Secret Scanning

Prevent leaks at commit time.

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.20.0
    hooks:
      - id: gitleaks
  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.5.0
    hooks:
      - id: detect-secrets

gitleaks and detect-secrets catch most patterns. For your org’s specific key formats, write custom regex:

# .gitleaks.toml
[allowlist]
description = "Allowlist for known test fixtures"
paths = [
  "tests/fixtures/.*",
]

[[rules]]
id = "myorg-api-key"
description = "MyOrg internal API key"
regex = '''myorg_(?:live|test)_[a-z0-9]{32}'''
keywords = ["myorg_"]

CI also runs the scanner; PRs containing secrets are blocked.

Step 5: Secret Scanning Across Existing Repos

For repos that may already contain leaked secrets, run a one-time and ongoing scan:

# One-time scan of historical commits.
gitleaks dir --source=/path/to/repo --report-path=/tmp/secrets.json

# Ongoing: GitHub's secret scanning (free for public repos; commercial Advanced Security for private).
# Or use TruffleHog OSS:
trufflehog git --repo=https://github.com/myorg/payments --json

For every finding: rotate, attribute, audit. Push notifications integrate into your ticketing.

Step 6: Per-Use Attribution Where Possible

Some providers attribute API calls to a specific key in their audit logs. Where possible, every team / service has its own key — never shared.

# Bad: one OpenAI key for the whole org.
key_id: openai-shared
owner_team: shared
consumed_by: [payments, recommendations, search, chat]   # all teams

# Good: per-team keys.
key_id: openai-payments
owner_team: payments
consumed_by: [payments]
key_id: openai-recommendations
owner_team: recommendations
consumed_by: [recommendations]

When OpenAI’s audit shows “key abc123 made a request,” you immediately know it’s the payments team’s key. Cross-team attribution requires per-team keys.

Step 7: Offboarding Flow

When an employee leaves, their personal-issued keys must be enumerated and rotated. Hook into your IDP:

# onboarding_offboarding.py
def employee_offboarded(user_email):
    # Find all keys in the registry where this employee was the individual owner.
    keys = registry.find(owner_individual=user_email)
    for k in keys:
        # Reassign to team, then rotate.
        k["owner_individual"] = k["owner_team"] + "-shared"
        rotate_key(k)   # generates new credential the departing employee never sees
    # Also revoke any session tokens the user had with internal SaaS.
    for sess in identity_provider.list_sessions(user_email):
        identity_provider.revoke(sess)

Triggered automatically when HR’s offboarding signal reaches the IDP.

Step 8: Telemetry and Audit

api_key_total{provider, owner_team}
api_key_age_days{key_id}
api_key_due_for_rotation_total
api_key_rotation_success_total
api_key_rotation_failure_total{reason}
api_key_secret_scanner_findings_total{repo, scanner}
api_key_unauthorized_use_detected_total{key_id}

Alert on:

  • api_key_age_days{...} > expiration — overdue rotation; team escalation.
  • api_key_secret_scanner_findings_total non-zero — leaks pending response.
  • api_key_unauthorized_use_detected_total non-zero — stolen-key incident.

Step 9: Enrichment for Provider Audit

Cross-correlate provider audit logs with your registry:

-- Stripe audit log shows key acct_xxx_yyy made an API call.
-- Look up registry to find who owns that key.
SELECT
    audit.timestamp,
    audit.key_id,
    audit.api_called,
    audit.client_ip,
    registry.owner_team,
    registry.owner_individual,
    registry.purpose
FROM stripe_audit AS audit
JOIN api_key_registry AS registry ON audit.key_id = registry.key_id
WHERE audit.timestamp > now() - interval '7 days'
ORDER BY audit.timestamp DESC;

A provider audit entry without a corresponding registry row is unusual — investigate. An entry whose source IP doesn’t match the expected service is unusual — investigate.

Step 10: Quarterly Audit

Run quarterly:

  • All keys have non-stale registry entries (last_audit < 90 days).
  • All keys have rotation within their declared frequency.
  • All keys have a current owner (team and individual exist in HR).
  • Any keys without observable use in 30+ days — candidate for revocation.
  • Any keys with broad scope (admin, cluster-wide) — review necessity.

Expected Behaviour

Signal Without lifecycle management With
Time to detect key leak Often months Within hours (secret-scanning + audit-log anomaly)
Time to rotate after employee leaves Indefinite Hours (offboarding flow)
Number of unaccounted-for keys Unknown; likely many Approximately zero
Per-key attribution Often “shared org” key Per-team / per-service
Rotation frequency “When something breaks” Annual (or shorter) on schedule
Audit trail Provider-side only Provider + registry + per-use

Trade-offs

Aspect Benefit Cost Mitigation
Programmatic issuance Tracked from creation Engineering effort to build One-time investment; reuse for many providers.
Per-team keys Attribution + isolation Many keys to manage Lifecycle automation handles bulk; cost is registry storage.
Auto-rotation No stale credentials Some provider limitations on automation Hybrid: automate where APIs allow; ticket-driven for the rest.
Pre-commit scanning Catches at source False positives Tune patterns; allowlist test fixtures.
Quarterly audit Catches drift Engineering time Automate the report; review takes 1-2 hours.
Offboarding integration Departed employees’ keys revoked Identity-provider integration Standard with IDP webhooks; same flow as user-account offboarding.

Failure Modes

Failure Symptom Detection Recovery
Rotation breaks production App can’t authenticate after rotation Synthetic monitor or app errors Automated rotation should detect via synthetic; revert to old key if verification fails. Fix and re-rotate.
Registry drift Keys exist in providers but not in registry Quarterly audit reconciliation Add to registry; investigate why created out-of-band.
Secret scanner false negative Leaked key not detected External notification (GitHub, Have I Been Pwned) Treat as incident; rotate immediately; investigate detection gap.
Provider rate-limits issuance Bulk rotation fails Provider returns 429 Spread rotation across hours; respect rate limits.
Offboarding integration miss Departed employee’s key persists Periodic IDP-vs-registry comparison Manual cleanup; fix integration.
Synthetic test gives false-pass Rotation completes but new key doesn’t actually work Apps fail despite rotation Improve synthetic to test the actual production code path.