API Key Lifecycle at Scale: Issuance, Rotation, Scoping, and Audit Across Cloud and SaaS
Problem
API keys leak. The 2024 GitGuardian “State of Secrets Sprawl” report found 23+ million secrets exposed in public GitHub commits over the year — a substantial fraction of those API keys for production systems. The 2025 figure was higher.
Most API-key-leak incidents share a shape: a key was issued for a one-off purpose, never rotated, never reviewed, eventually committed to a public repo or copied to an attacker via some other vector. The leak isn’t surprising; the cleanup is hard because nobody knows what the key was for or what to revoke.
By 2026 mature programs treat API keys as tracked, lifecycle-managed objects:
- Issued with explicit purpose, owner, and scope.
- Used with auditable per-call attribution.
- Rotated on schedule and on event (employee departure, suspected compromise).
- Revoked automatically at end-of-life or on detection.
- Discoverable — every key has a system-of-record entry showing where it’s used.
The scale problem: a typical org has 10,000-100,000 API keys across cloud providers (AWS, GCP, Azure), SaaS (Stripe, Twilio, Datadog, GitHub, OpenAI), internal services, and legacy systems. Per-key manual lifecycle is impossible.
The specific gaps in unmanaged programs:
- New keys minted via UI clicks; no record of purpose.
- Keys live in
.envfiles, committed to repos, screenshotted in Slack. - Rotation never happens because “this key is in production, can’t risk breaking it.”
- Departed employees’ keys persist for months / years.
- No audit trail of which key did what — provider logs show “key X did Y” but X has no description.
- Scope often defaults to admin / cluster-wide / unrestricted.
This article covers the lifecycle stages, secret-scanning prevention, programmatic issuance with a system-of-record, automated rotation, and the operational integration with employee onboarding / offboarding.
Target systems: Vault, AWS IAM Identity Center, GCP IAM, Azure Entra, GitGuardian / TruffleHog for scanning, Akeyless / Doppler / Infisical / 1Password for secret management.
Threat Model
- Adversary 1 — Public-repo leak: an engineer commits an API key to a public repo; an attacker scans GitHub and uses it.
- Adversary 2 — Insider threat: an employee with key access uses it for unauthorized purpose; key has no per-employee attribution.
- Adversary 3 — Endpoint compromise: malware exfiltrates
.envfiles or developer credentials. - Adversary 4 — SaaS provider compromise: a SaaS provider is breached; their copy of your API key for them is exposed.
- Adversary 5 — Departed employee: former employee retains keys they personally created; uses them after leaving.
- Access level: Adversaries 1, 3, 5 have laptop / repo access. Adversaries 2, 4 are insider / vendor-side.
- Objective: Use the API key to impersonate the legitimate owner; read or modify data; pivot to other systems.
- Blast radius: an unmanaged key with broad scope = full access to the resource indefinitely. With lifecycle management: bounded scope, bounded lifetime, attribution per use.
Configuration
Step 1: System of Record
Every key in your environment has an entry in a central registry.
# api-key-registry/keys/payments-stripe-live.yaml
key_id: payments-stripe-live-001
description: "Stripe live API key for payments service"
owner_team: payments
owner_individual: alice@example.com
created_at: 2026-04-01T10:00:00Z
expiration: 2027-04-01T10:00:00Z
last_rotated: 2026-04-01T10:00:00Z
scope:
provider: stripe
account_id: acct_xxx
permissions: ["charges:write", "customers:read"]
webhook_endpoints: ["https://api.example.com/stripe-webhook"]
storage:
vault_path: "secret/payments/stripe-live"
consumed_by:
- kubernetes_secret: "payments-stripe-credentials"
namespace: payments
audit:
rotation_owner: payments-team
rotation_frequency_days: 365
last_audit: 2026-04-01
Every API key has a YAML in this registry. CI validates that:
- Every key has an owner (team + individual).
- Every key has an expiration < 1 year.
- Every key has a documented purpose and scope.
- The Vault path actually exists.
Without a registry entry, the key shouldn’t exist. Onboarding a new key without registry creation is a policy violation.
Step 2: Programmatic Issuance
When teams need a new API key, the workflow goes through automation, not the SaaS UI:
# issue_key.py
import argparse, yaml, requests, datetime
def issue_stripe_key(team, purpose, scope):
# Generate key in Stripe via API.
response = requests.post(
f"https://api.stripe.com/v1/api_keys",
auth=(STRIPE_PROVISIONING_KEY, ""),
data={
"name": f"{team}-{purpose}-{datetime.date.today()}",
"role": scope, # restricted role
},
)
new_key = response.json()
key_id = new_key["id"]
secret = new_key["secret"]
# Write to Vault.
vault_path = f"secret/{team}/stripe-{purpose}"
requests.post(
f"{VAULT_ADDR}/v1/{vault_path}",
headers={"X-Vault-Token": VAULT_TOKEN},
json={"data": {"key": secret, "key_id": key_id}},
)
# Register in system-of-record.
registry_entry = {
"key_id": key_id,
"description": purpose,
"owner_team": team,
"created_at": datetime.datetime.now(datetime.timezone.utc).isoformat(),
"expiration": (datetime.datetime.now() + datetime.timedelta(days=365)).isoformat(),
"scope": scope,
"storage": {"vault_path": vault_path},
}
register_in_git(registry_entry)
if __name__ == "__main__":
# CLI for engineers to request a new key.
...
The flow: engineer runs ./issue_key.py --team payments --purpose webhooks --scope charges:write. Issuance, storage, and registry update happen atomically. No clicking in the Stripe UI.
Step 3: Rotation Automation
Every key rotates on schedule. A controller monitors the registry:
# rotation_controller.py
import datetime, yaml, glob
def find_due_for_rotation():
due = []
for path in glob.glob("api-key-registry/keys/*.yaml"):
entry = yaml.safe_load(open(path))
last_rot = datetime.datetime.fromisoformat(entry["last_rotated"])
freq = entry["audit"]["rotation_frequency_days"]
if (datetime.datetime.now() - last_rot).days >= freq:
due.append(entry)
return due
def rotate_key(entry):
# Mint new key.
new_secret = provision_new_key(entry)
# Write to Vault.
write_to_vault(entry["storage"]["vault_path"], new_secret)
# Wait for consumers to pick up (via External Secrets Operator).
wait_for_consumers(entry, timeout_minutes=15)
# Verify new key works (via a synthetic test).
if not synthetic_test_passes(new_secret, entry):
raise RotationError("Synthetic test failed; aborting rotation")
# Revoke old key.
revoke_old_key(entry["key_id"])
# Update registry.
entry["last_rotated"] = datetime.datetime.now(datetime.timezone.utc).isoformat()
update_registry(entry)
# Run nightly via cron.
for entry in find_due_for_rotation():
rotate_key(entry)
The flow handles dual-write windows: new key written, consumers refresh (via External Secrets Operator + reload), old key revoked once consumers are confirmed using the new.
For SaaS providers without an issuance API, fall back to manual rotation with structured tickets:
# rotation-ticket-template.md (Jira / Linear / GitHub Issue).
title: "Rotate {key_id} (due {due_date})"
body: |
Key {key_id} is due for rotation by {due_date}.
Owner: {owner_individual} / {owner_team}
Provider: {provider}
Vault path: {vault_path}
Steps:
- [ ] Mint new key in {provider} UI
- [ ] Update Vault path
- [ ] Confirm consumers using new key (synthetic test)
- [ ] Revoke old key in {provider} UI
- [ ] Update registry's last_rotated timestamp
Manual rotation tickets are tracked in your normal ticketing system; SLA enforced by engineering management.
Step 4: Pre-Commit Secret Scanning
Prevent leaks at commit time.
# .pre-commit-config.yaml
repos:
- repo: https://github.com/gitleaks/gitleaks
rev: v8.20.0
hooks:
- id: gitleaks
- repo: https://github.com/Yelp/detect-secrets
rev: v1.5.0
hooks:
- id: detect-secrets
gitleaks and detect-secrets catch most patterns. For your org’s specific key formats, write custom regex:
# .gitleaks.toml
[allowlist]
description = "Allowlist for known test fixtures"
paths = [
"tests/fixtures/.*",
]
[[rules]]
id = "myorg-api-key"
description = "MyOrg internal API key"
regex = '''myorg_(?:live|test)_[a-z0-9]{32}'''
keywords = ["myorg_"]
CI also runs the scanner; PRs containing secrets are blocked.
Step 5: Secret Scanning Across Existing Repos
For repos that may already contain leaked secrets, run a one-time and ongoing scan:
# One-time scan of historical commits.
gitleaks dir --source=/path/to/repo --report-path=/tmp/secrets.json
# Ongoing: GitHub's secret scanning (free for public repos; commercial Advanced Security for private).
# Or use TruffleHog OSS:
trufflehog git --repo=https://github.com/myorg/payments --json
For every finding: rotate, attribute, audit. Push notifications integrate into your ticketing.
Step 6: Per-Use Attribution Where Possible
Some providers attribute API calls to a specific key in their audit logs. Where possible, every team / service has its own key — never shared.
# Bad: one OpenAI key for the whole org.
key_id: openai-shared
owner_team: shared
consumed_by: [payments, recommendations, search, chat] # all teams
# Good: per-team keys.
key_id: openai-payments
owner_team: payments
consumed_by: [payments]
key_id: openai-recommendations
owner_team: recommendations
consumed_by: [recommendations]
When OpenAI’s audit shows “key abc123 made a request,” you immediately know it’s the payments team’s key. Cross-team attribution requires per-team keys.
Step 7: Offboarding Flow
When an employee leaves, their personal-issued keys must be enumerated and rotated. Hook into your IDP:
# onboarding_offboarding.py
def employee_offboarded(user_email):
# Find all keys in the registry where this employee was the individual owner.
keys = registry.find(owner_individual=user_email)
for k in keys:
# Reassign to team, then rotate.
k["owner_individual"] = k["owner_team"] + "-shared"
rotate_key(k) # generates new credential the departing employee never sees
# Also revoke any session tokens the user had with internal SaaS.
for sess in identity_provider.list_sessions(user_email):
identity_provider.revoke(sess)
Triggered automatically when HR’s offboarding signal reaches the IDP.
Step 8: Telemetry and Audit
api_key_total{provider, owner_team}
api_key_age_days{key_id}
api_key_due_for_rotation_total
api_key_rotation_success_total
api_key_rotation_failure_total{reason}
api_key_secret_scanner_findings_total{repo, scanner}
api_key_unauthorized_use_detected_total{key_id}
Alert on:
api_key_age_days{...} > expiration— overdue rotation; team escalation.api_key_secret_scanner_findings_totalnon-zero — leaks pending response.api_key_unauthorized_use_detected_totalnon-zero — stolen-key incident.
Step 9: Enrichment for Provider Audit
Cross-correlate provider audit logs with your registry:
-- Stripe audit log shows key acct_xxx_yyy made an API call.
-- Look up registry to find who owns that key.
SELECT
audit.timestamp,
audit.key_id,
audit.api_called,
audit.client_ip,
registry.owner_team,
registry.owner_individual,
registry.purpose
FROM stripe_audit AS audit
JOIN api_key_registry AS registry ON audit.key_id = registry.key_id
WHERE audit.timestamp > now() - interval '7 days'
ORDER BY audit.timestamp DESC;
A provider audit entry without a corresponding registry row is unusual — investigate. An entry whose source IP doesn’t match the expected service is unusual — investigate.
Step 10: Quarterly Audit
Run quarterly:
- All keys have non-stale registry entries (last_audit < 90 days).
- All keys have rotation within their declared frequency.
- All keys have a current owner (team and individual exist in HR).
- Any keys without observable use in 30+ days — candidate for revocation.
- Any keys with broad scope (admin, cluster-wide) — review necessity.
Expected Behaviour
| Signal | Without lifecycle management | With |
|---|---|---|
| Time to detect key leak | Often months | Within hours (secret-scanning + audit-log anomaly) |
| Time to rotate after employee leaves | Indefinite | Hours (offboarding flow) |
| Number of unaccounted-for keys | Unknown; likely many | Approximately zero |
| Per-key attribution | Often “shared org” key | Per-team / per-service |
| Rotation frequency | “When something breaks” | Annual (or shorter) on schedule |
| Audit trail | Provider-side only | Provider + registry + per-use |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Programmatic issuance | Tracked from creation | Engineering effort to build | One-time investment; reuse for many providers. |
| Per-team keys | Attribution + isolation | Many keys to manage | Lifecycle automation handles bulk; cost is registry storage. |
| Auto-rotation | No stale credentials | Some provider limitations on automation | Hybrid: automate where APIs allow; ticket-driven for the rest. |
| Pre-commit scanning | Catches at source | False positives | Tune patterns; allowlist test fixtures. |
| Quarterly audit | Catches drift | Engineering time | Automate the report; review takes 1-2 hours. |
| Offboarding integration | Departed employees’ keys revoked | Identity-provider integration | Standard with IDP webhooks; same flow as user-account offboarding. |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Rotation breaks production | App can’t authenticate after rotation | Synthetic monitor or app errors | Automated rotation should detect via synthetic; revert to old key if verification fails. Fix and re-rotate. |
| Registry drift | Keys exist in providers but not in registry | Quarterly audit reconciliation | Add to registry; investigate why created out-of-band. |
| Secret scanner false negative | Leaked key not detected | External notification (GitHub, Have I Been Pwned) | Treat as incident; rotate immediately; investigate detection gap. |
| Provider rate-limits issuance | Bulk rotation fails | Provider returns 429 | Spread rotation across hours; respect rate limits. |
| Offboarding integration miss | Departed employee’s key persists | Periodic IDP-vs-registry comparison | Manual cleanup; fix integration. |
| Synthetic test gives false-pass | Rotation completes but new key doesn’t actually work | Apps fail despite rotation | Improve synthetic to test the actual production code path. |