Prometheus Alertmanager Security: Receiver Credentials, Silencing Controls, and Inhibition Rules
Problem
Alertmanager is the routing and notification layer for Prometheus alerts. It receives firing alerts, matches them to routing rules, and delivers notifications via receivers — Slack webhooks, PagerDuty integration keys, email SMTP credentials, and OpsGenie API keys. These credentials are stored in the Alertmanager configuration file and are accessible to anyone who can read it.
The security implications of a compromised Alertmanager go beyond credential theft: an attacker who can modify Alertmanager configuration can silence all security alerts indefinitely. This is not hypothetical — attacker tooling explicitly targets monitoring and alerting infrastructure to create a blind spot for subsequent activity.
Common weaknesses:
- Receiver credentials in plaintext configuration. Alertmanager’s
alertmanager.ymlcontains Slack webhook URLs, PagerDuty routing keys, SMTP passwords, and OpsGenie API keys in plaintext. Any process or user with read access to the file or the Kubernetes Secret containing it has all notification credentials. - Unauthenticated silence API. Alertmanager’s HTTP API (
/api/v2/silences) accepts POST requests to create silences without authentication in default deployments. An attacker who can reach the Alertmanager endpoint can silence all alerts indefinitely with a single API call. - Overly broad inhibition rules. Inhibition rules suppress child alerts when a parent alert fires. A poorly designed inhibition rule that fires on a broad condition can suppress security alerts when infrastructure is under load — precisely the correlation an attacker exploiting a high-load condition would want.
- No alerting on alerting system health. Nobody monitors the monitor. If Alertmanager fails, stops processing, or its receivers are unreachable, security alerts are silently dropped. Without health monitoring for the alerting pipeline itself, failures go undetected.
- Shared Alertmanager across environments. Development and production alerts route through the same Alertmanager instance. A noisy development environment generates alert storms that desensitise on-call engineers to production alerts.
Target systems: Alertmanager 0.27+ (kube-prometheus-stack, standalone); Kubernetes Secret-based credential management; Alertmanager HA with gossip protocol; webhook receivers.
Threat Model
- Adversary 1 — Receiver credential exfiltration: An attacker reads the Alertmanager configuration (from a ConfigMap, Secret, or configuration file) and extracts PagerDuty routing keys, Slack webhook URLs, and email credentials. They use these to send fake alerts or to understand the on-call rotation.
- Adversary 2 — Silence API abuse: An attacker who reaches the Alertmanager HTTP API creates a silence matching
alertname=~".+"(all alerts) for 30 days. All security alerts are silently suppressed while the attacker operates. - Adversary 3 — Inhibition rule exploitation: An attacker triggers a high-load condition (resource exhaustion, synthetic DDoS) that fires an inhibition parent alert. The inhibition rule suppresses security-relevant child alerts — intrusion detection, unusual login, lateral movement — while the attacker operates.
- Adversary 4 — Webhook receiver SSRF: An Alertmanager webhook receiver is configured to POST alert data to an internal URL. An attacker who can modify the receiver URL (via configuration access) points it to an internal service endpoint, using Alertmanager as an SSRF proxy.
- Adversary 5 — HA gossip poisoning: In a multi-instance Alertmanager HA deployment, instances share silence and notification state via a gossip protocol. An attacker who can reach the gossip port injects a silence entry that propagates to all instances.
- Access level: Adversaries 1 needs read access to configuration. Adversary 2 needs network access to the Alertmanager HTTP API. Adversaries 3 and 5 require network or API access. Adversary 4 needs configuration write access.
- Objective: Silence security alerts; steal notification credentials; blind the security operations team.
- Blast radius: Successful Alertmanager silence of security alerts means the entire Prometheus-based alerting pipeline is dark. Attacks can proceed undetected for the silence duration.
Configuration
Step 1: Secrets Management for Receiver Credentials
Never store receiver credentials in the Alertmanager config file:
# BAD: plaintext credentials in alertmanager.yml.
receivers:
- name: pagerduty-production
pagerduty_configs:
- routing_key: "abc123def456..." # Plaintext secret.
- name: slack-security
slack_configs:
- api_url: "https://hooks.slack.com/services/T.../B.../..." # Plaintext webhook.
# GOOD: reference environment variables (Alertmanager 0.24+).
receivers:
- name: pagerduty-production
pagerduty_configs:
- routing_key: "$PAGERDUTY_ROUTING_KEY" # Loaded from env variable at startup.
- name: slack-security
slack_configs:
- api_url: "$SLACK_WEBHOOK_URL"
# Kubernetes deployment: inject secrets as environment variables.
apiVersion: apps/v1
kind: Deployment
metadata:
name: alertmanager
namespace: monitoring
spec:
template:
spec:
containers:
- name: alertmanager
env:
- name: PAGERDUTY_ROUTING_KEY
valueFrom:
secretKeyRef:
name: alertmanager-credentials
key: pagerduty-routing-key
- name: SLACK_WEBHOOK_URL
valueFrom:
secretKeyRef:
name: alertmanager-credentials
key: slack-webhook-url
# Mount config without credentials (uses env variable references).
volumeMounts:
- name: config
mountPath: /etc/alertmanager
# Store credentials in Vault; sync to Kubernetes Secret via External Secrets Operator.
vault kv put secret/alertmanager/receivers \
pagerduty-routing-key="$PAGERDUTY_KEY" \
slack-webhook-url="$SLACK_WEBHOOK"
Step 2: Authenticate the Alertmanager API
Alertmanager has no built-in authentication. Enforce it at the ingress layer:
# Kubernetes Ingress with basic auth or OAuth2 Proxy.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: alertmanager
namespace: monitoring
annotations:
nginx.ingress.kubernetes.io/auth-type: basic
nginx.ingress.kubernetes.io/auth-secret: alertmanager-basic-auth
nginx.ingress.kubernetes.io/auth-realm: "Alertmanager"
# Or: use OAuth2 Proxy for SSO.
# nginx.ingress.kubernetes.io/auth-url: "https://oauth2.internal/oauth2/auth"
nginx.ingress.kubernetes.io/whitelist-source-range: "10.0.0.0/8"
spec:
rules:
- host: alertmanager.internal.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: alertmanager
port:
number: 9093
# Create basic auth credentials for Alertmanager UI/API access.
htpasswd -c /tmp/htpasswd alertmanager-admin
kubectl create secret generic alertmanager-basic-auth \
--from-file=auth=/tmp/htpasswd \
--namespace monitoring
Restrict network access to Alertmanager:
# NetworkPolicy: Alertmanager only reachable from Prometheus and ingress.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: alertmanager-access
namespace: monitoring
spec:
podSelector:
matchLabels:
app: alertmanager
policyTypes:
- Ingress
ingress:
# Prometheus pushes alerts.
- from:
- podSelector:
matchLabels:
app: prometheus
ports:
- port: 9093
# Ingress controller for authenticated UI/API access.
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: ingress-nginx
ports:
- port: 9093
# HA gossip between Alertmanager instances.
- from:
- podSelector:
matchLabels:
app: alertmanager
ports:
- port: 9094
protocol: UDP
- port: 9094
protocol: TCP
# NO: direct pod access to Alertmanager.
Step 3: Silence Governance
Implement controls around silence creation:
# alertmanager.yml — restrict silence duration.
# Alertmanager does not natively limit silence duration, but you can:
# 1. Use the API with custom tooling that enforces limits.
# 2. Alert on long-duration silences.
# Prometheus rule: alert on long-duration silences.
groups:
- name: alertmanager-governance
rules:
- alert: AlertmanagerSilenceTooLong
expr: |
(alertmanager_silences{state="active"} > 0)
and on()
# Check if any silence extends more than 4 hours.
# (alertmanager_silence_expires_at - time()) > 14400
for: 0m
labels:
severity: warning
annotations:
summary: "Alertmanager silence exceeds 4-hour limit"
description: "A silence has been created for more than 4 hours. Review required."
- alert: AlertmanagerAllAlertsSilenced
expr: |
count(alertmanager_silences{state="active"}) > 0
and absent(ALERTS{alertstate="firing"})
and count(up{job="prometheus"}) > 0
for: 5m
labels:
severity: critical
team: security
annotations:
summary: "ALL alerts appear silenced — possible attacker silence"
description: "No alerts are firing despite Prometheus being healthy. Verify Alertmanager silences."
# silence_auditor.py — audit all active silences.
import requests
from datetime import datetime, timezone
def audit_silences(alertmanager_url: str, auth: tuple) -> list:
"""Returns silences that warrant review."""
resp = requests.get(
f"{alertmanager_url}/api/v2/silences",
auth=auth,
verify=True,
)
silences = resp.json()
suspicious = []
for silence in silences:
if silence["status"]["state"] != "active":
continue
ends_at = datetime.fromisoformat(silence["endsAt"].replace("Z", "+00:00"))
duration_hours = (ends_at - datetime.now(timezone.utc)).total_seconds() / 3600
# Flag: silence longer than 4 hours.
if duration_hours > 4:
suspicious.append({
"id": silence["id"],
"created_by": silence["createdBy"],
"comment": silence["comment"],
"hours_remaining": duration_hours,
"matchers": silence["matchers"],
})
# Flag: silence matching all alerts.
if any(m["name"] == "alertname" and m["isRegex"] and m["value"] == ".+"
for m in silence["matchers"]):
suspicious.append({"reason": "matches all alerts", **silence})
return suspicious
Step 4: Inhibition Rule Hardening
Review inhibition rules to prevent them from suppressing security alerts:
# alertmanager.yml — safe inhibition configuration.
inhibit_rules:
# GOOD: Inhibit disk-full warning when disk-full critical fires.
# Both must match the same instance. Security alerts are NOT inhibited.
- source_match:
alertname: DiskFull
severity: critical
target_match:
alertname: DiskFull
severity: warning
equal: [instance]
# GOOD: Inhibit dependent service alerts when the root cause fires.
- source_match:
alertname: DatabaseDown
target_match:
alertname: ApplicationHighErrorRate
equal: [environment]
# BAD — do NOT use:
# This inhibits ALL alerts when any node is down — including security alerts.
# - source_match:
# alertname: NodeDown
# target_match_re:
# severity: ".*" # Suppresses everything.
# NEVER inhibit security-labelled alerts.
# Add team=security label to all security alerts.
# Ensure no inhibition rule targets team=security.
Add explicit security alert labels to prevent inhibition:
# Prometheus alerting rules — label security alerts explicitly.
groups:
- name: security
rules:
- alert: UnauthorisedAPIAccess
expr: sum(rate(http_requests_total{status=~"401|403"}[5m])) > 10
labels:
severity: high
team: security
inhibit_protected: "true" # Custom label to prevent inhibition.
annotations:
summary: "Elevated authentication failures"
Step 5: Receiver Endpoint Validation
Prevent SSRF via webhook receivers:
# alertmanager.yml — restrict webhook URLs to approved destinations.
# Alertmanager does not natively validate URLs, but you can enforce via:
# 1. Policy-as-code review of alertmanager.yml changes.
# 2. Egress NetworkPolicy on the Alertmanager pod.
receivers:
- name: slack-security
slack_configs:
- api_url: "$SLACK_WEBHOOK_URL" # Must be hooks.slack.com.
# Never allow: internal URLs, 10.x.x.x, 172.x.x.x, 192.168.x.x.
# NetworkPolicy: Alertmanager egress restricted to approved notification endpoints.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: alertmanager-egress
namespace: monitoring
spec:
podSelector:
matchLabels:
app: alertmanager
policyTypes:
- Egress
egress:
# PagerDuty API.
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
ports:
- port: 443
# DNS.
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- port: 53
protocol: UDP
Step 6: Alert on Alertmanager Health
# Prometheus rules: monitor the monitoring system itself.
groups:
- name: alertmanager-health
rules:
- alert: AlertmanagerDown
expr: absent(up{job="alertmanager"} == 1)
for: 2m
labels:
severity: critical
team: platform
annotations:
summary: "Alertmanager is down — alerts not being delivered"
- alert: AlertmanagerReceiverFailure
expr: rate(alertmanager_notifications_failed_total[5m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Alertmanager receiver failing — alerts being dropped"
description: "Receiver {{ $labels.receiver }} is failing. Check credentials."
- alert: AlertmanagerNoActiveAlerts
expr: |
count(ALERTS{alertstate="firing", team="security"}) == 0
and
count(up{job="prometheus"}) > 0
and
count(alertmanager_silences{state="active"}) > 0
for: 10m
labels:
severity: warning
annotations:
summary: "No security alerts firing with active silences — verify"
Step 7: HA Deployment Security
# Alertmanager HA: restrict gossip port.
# Deploy 3 instances; mesh via gossip on port 9094.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: alertmanager
namespace: monitoring
spec:
replicas: 3
template:
spec:
containers:
- name: alertmanager
args:
- --config.file=/etc/alertmanager/alertmanager.yml
- --storage.path=/alertmanager
- --cluster.peer=alertmanager-0.alertmanager:9094
- --cluster.peer=alertmanager-1.alertmanager:9094
- --cluster.peer=alertmanager-2.alertmanager:9094
- --cluster.listen-address=0.0.0.0:9094
# TLS for gossip (Alertmanager 0.25+).
- --cluster.tls-config.cert=/etc/alertmanager/tls/tls.crt
- --cluster.tls-config.key=/etc/alertmanager/tls/tls.key
- --cluster.tls-config.client-ca=/etc/alertmanager/tls/ca.crt
Step 8: Telemetry
alertmanager_alerts_received_total{receiver, status} counter
alertmanager_notifications_total{receiver, integration} counter
alertmanager_notifications_failed_total{receiver, integration} counter
alertmanager_silences{state} gauge
alertmanager_inhibitions_muted_alerts_total{} counter
alertmanager_receivers{name} gauge
alertmanager_config_hash{} gauge
Alert on:
alertmanager_notifications_failed_totalnon-zero — receiver failing; alerts being dropped; investigate credentials.alertmanager_silences{state="active"}> expected — unexpected silences active; possible attacker intervention.alertmanager_config_hashchanges unexpectedly — configuration was modified outside of change management.- No
alertmanager_alerts_received_totalincrement despite known firing alert — Prometheus → Alertmanager pipeline broken. alertmanager_inhibitions_muted_alerts_totalspike — many alerts being inhibited; review inhibition rules for overly broad match.
Expected Behaviour
| Signal | Default Alertmanager | Hardened Alertmanager |
|---|---|---|
| Receiver credential exposure | Plaintext in config file | Environment variable injection from Kubernetes Secret |
| Unauthenticated silence API | Any pod can silence all alerts | API authentication required; NetworkPolicy blocks direct pod access |
| Broad inhibition silences security alerts | Security alerts suppressed during infra incident | Security alerts labelled; excluded from inhibition rules |
| Alertmanager receiver failure | Alerts silently dropped | AlertmanagerReceiverFailure fires to backup channel |
| Webhook SSRF via receiver URL | Internal URL reachable | NetworkPolicy blocks internal destinations from Alertmanager egress |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Env variable credentials | No plaintext in config | Requires Secret injection in deployment | External Secrets Operator automates credential sync from Vault |
| Ingress-level authentication | Protects silence API | Ingress dependency for Alertmanager access | Internal Ingress within cluster; separate from external ingress |
| Egress NetworkPolicy | Prevents SSRF and exfiltration | Notification endpoints must be allowlisted by IP | Use DNS-based egress policy (Cilium FQDN policies) for dynamic IPs |
| 3-replica HA | Reliability | More resource consumption | Required for production; use anti-affinity to spread across nodes |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Credential rotation breaks receiver | Notifications fail silently | alertmanager_notifications_failed_total |
Update Kubernetes Secret; Alertmanager hot-reloads config |
| Authentication misconfiguration | Prometheus cannot reach Alertmanager API | Prometheus shows “alertmanager not available” | Verify NetworkPolicy and ingress auth bypass for Prometheus |
| Gossip TLS cert expired | HA instances cannot sync; duplicate alerts | Gossip port errors in logs | cert-manager auto-renewal; manual renewal as fallback |
| Config error after hot-reload | Alertmanager reverts to previous config | Config hash doesn’t update; config load error log | Validate config with amtool check-config before applying |