Prometheus Operator RBAC: Cluster-Wide Secret Access via ServiceMonitor
Problem
The Prometheus Operator is the de facto standard for deploying Prometheus in Kubernetes. It introduces ServiceMonitor, PodMonitor, and PrometheusRule CRDs that make it easy to define scrape targets declaratively. The operator creates and manages Prometheus instances that scrape metrics from pods and services across the cluster.
The default RBAC configuration deployed by the Prometheus Operator Helm chart grants Prometheus a ClusterRole with broad permissions including reading secrets across all namespaces. The rationale is that Prometheus needs to read Secret objects to find credentials for scrape targets that use basic auth or bearer token authentication.
The security problem: in most clusters, the vast majority of Prometheus scrape targets use no authentication at all, or use service mesh mTLS. The cluster-wide secret read permission is granted by default regardless of whether it is needed. A compromised Prometheus instance — via CVE, misconfiguration, or supply chain attack — can read every secret in every namespace.
The ServiceMonitor escalation path. An attacker who can create or modify ServiceMonitor objects can craft a monitor that scrapes an endpoint they control. That endpoint can return a redirect or other response that causes Prometheus to expose credential information through its own metrics or configuration API. More directly, if the attacker has Prometheus object write access, they can modify the Prometheus instance to scrape endpoints that return the content of Kubernetes secrets as “metrics.”
The actual blast radius. In a typical cluster, Kubernetes secrets contain: database credentials, API keys, TLS private keys, cloud provider credentials (AWS access keys, GCP service account keys), registry pull secrets, and application configuration secrets. A compromised Prometheus with cluster-wide secret read can enumerate and exfiltrate all of these.
Why this is underappreciated. Prometheus is treated as infrastructure rather than a security boundary. Its service account is not subject to the same review as application service accounts. The cluster-wide permissions are accepted during Helm chart deployment without audit.
Target systems: any Kubernetes cluster using the Prometheus Operator (kube-prometheus-stack, prometheus-operator Helm chart); multi-tenant clusters where different teams own different namespaces; clusters with high-value secrets.
Threat Model
Adversary 1 — Prometheus CVE enables secret exfiltration. A remote code execution CVE in Prometheus (or in a Prometheus exporter that has elevated RBAC) gives an attacker code execution in the Prometheus pod. The pod’s service account has cluster-wide secret read. The attacker calls the Kubernetes API to list and read all secrets across the cluster.
Adversary 2 — Crafted ServiceMonitor scrapes secret-returning endpoint. An attacker who can create ServiceMonitor objects creates one pointing at an endpoint they control. The endpoint returns responses that cause Prometheus to make additional requests (via relabeling or recording rules), eventually exfiltrating secret data through Prometheus’s own legitimate API endpoint.
Adversary 3 — Prometheus configuration API exposes credentials. Prometheus’s /api/v1/status/config endpoint returns the full scrape configuration, including any bearer tokens or basic auth credentials configured in ServiceMonitors. If this endpoint is accessible (even internally), a lateral-moving attacker can read credentials used for scraping.
Configuration / Implementation
Step 1 — Audit the current Prometheus service account permissions
#!/bin/bash
# Audit Prometheus Operator RBAC permissions
PROM_SA=$(kubectl get prometheus -A -o jsonpath='{.items[0].spec.serviceAccountName}' 2>/dev/null || echo "prometheus-kube-prometheus-prometheus")
PROM_NS=$(kubectl get prometheus -A -o jsonpath='{.items[0].metadata.namespace}' 2>/dev/null || echo "monitoring")
echo "=== Prometheus Service Account: $PROM_SA in $PROM_NS ==="
# Show all permissions granted to the Prometheus service account
kubectl auth can-i --list \
--as="system:serviceaccount:${PROM_NS}:${PROM_SA}" 2>/dev/null | \
grep -v "^no " | head -40
echo ""
echo "=== Checking for cluster-wide secret access ==="
kubectl auth can-i get secrets \
--as="system:serviceaccount:${PROM_NS}:${PROM_SA}" \
--all-namespaces 2>/dev/null && \
echo "WARNING: Prometheus can read secrets cluster-wide" || \
echo "OK: No cluster-wide secret access"
echo ""
echo "=== Prometheus ClusterRoles bound to this SA ==="
kubectl get clusterrolebindings -o json 2>/dev/null | \
jq -r --arg ns "$PROM_NS" --arg sa "$PROM_SA" '
.items[] |
select(.subjects[]? | select(.kind=="ServiceAccount" and .namespace==$ns and .name==$sa)) |
"\(.metadata.name) -> \(.roleRef.name)"'
Step 2 — Replace the default ClusterRole with scoped permissions
# prometheus-rbac-scoped.yaml
# Replace the default broad ClusterRole with a scoped version
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-scoped
rules:
# Core metrics collection — required
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
# Node metrics endpoint
- apiGroups: [""]
resources: ["nodes/proxy"]
verbs: ["get"]
# Non-resource URLs for metrics endpoints
- nonResourceURLs:
- "/metrics"
- "/metrics/cadvisor"
verbs: ["get"]
# Ingress metrics (if scraping ingress controllers)
- apiGroups: ["networking.k8s.io"]
resources: ["ingresses"]
verbs: ["get", "list", "watch"]
# Prometheus Operator CRDs
- apiGroups: ["monitoring.coreos.com"]
resources:
- servicemonitors
- podmonitors
- prometheusrules
- probes
verbs: ["get", "list", "watch"]
# IMPORTANT: Do NOT include secrets here unless specifically required
# If scrape targets need authentication, use namespace-scoped secret access
# only in the namespaces where auth is needed
---
# For namespaces where scrape targets use authentication:
# Grant secret access only in those specific namespaces
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: prometheus-secret-reader
namespace: production # Only in the namespace that needs it
rules:
- apiGroups: [""]
resources: ["secrets"]
# Only allow reading specific secrets needed for scraping
# Use resourceNames to limit to only the required secrets
resourceNames:
- "prometheus-scrape-credentials"
- "thanos-object-storage"
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: prometheus-secret-reader
namespace: production
subjects:
- kind: ServiceAccount
name: prometheus-kube-prometheus-prometheus
namespace: monitoring
roleRef:
kind: Role
name: prometheus-secret-reader
apiGroup: rbac.authorization.k8s.io
Step 3 — Apply via Helm values override
# values-prometheus-rbac-hardened.yaml
# Helm values for kube-prometheus-stack with scoped RBAC
prometheus:
prometheusSpec:
# Use a custom service account with scoped permissions
serviceAccountName: prometheus-scoped-sa
# Restrict which namespaces Prometheus monitors
# Instead of monitoring all namespaces, limit to specific ones
serviceMonitorNamespaceSelector:
matchLabels:
monitoring: "enabled" # Only namespaces with this label
podMonitorNamespaceSelector:
matchLabels:
monitoring: "enabled"
ruleNamespaceSelector:
matchLabels:
monitoring: "enabled"
# Restrict ServiceMonitor selection to prevent crafted monitors
serviceMonitorSelector:
matchLabels:
prometheus: "main" # Only ServiceMonitors with this label
# Disable the Prometheus web UI config endpoint to prevent credential exposure
web:
pageTitle: "Prometheus"
# Enable authentication on Prometheus UI
externalLabels:
cluster: "production"
# Create a custom service account
serviceAccount:
create: true
name: prometheus-scoped-sa
annotations: {}
# Disable default overly-broad ClusterRole creation
# and supply our own scoped version
rbac:
create: false # Don't create the default broad RBAC
Step 4 — Protect the Prometheus config and admin APIs
# Prometheus is deployed with authentication via Grafana OAuth or a reverse proxy
# The /api/v1/status/config endpoint returns scrape credentials — protect it
# Option 1: Deploy a sidecar proxy that requires authentication
# kube-rbac-proxy protects Prometheus behind Kubernetes RBAC auth
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-proxy-config
namespace: monitoring
data:
config.yaml: |
authorization:
resourceAttributes:
namespace: monitoring
apiGroup: ""
resource: services
subresource: proxy
name: prometheus-operated
# Option 2: NetworkPolicy restricting Prometheus access to the monitoring namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: prometheus-access-restriction
namespace: monitoring
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: prometheus
policyTypes: [Ingress]
ingress:
# Allow Grafana to query Prometheus
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: grafana
ports:
- port: 9090
protocol: TCP
# Allow Alertmanager
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: alertmanager
ports:
- port: 9090
# Allow monitoring namespace pods only
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: monitoring
Step 5 — Alert on suspicious Prometheus behaviour
# PrometheusRule for detecting Prometheus RBAC abuse indicators
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: prometheus-security-monitoring
namespace: monitoring
spec:
groups:
- name: prometheus_security
rules:
# Alert when Prometheus scrape config references secrets outside expected namespaces
- alert: PrometheusUnexpectedSecretAccess
expr: |
count(
kube_secret_info{namespace!~"monitoring|kube-system"}
unless on(namespace, secret)
(
count by (namespace, secret) (
prometheus_sd_kubernetes_cache_events_total{event="add", role="secret"}
)
)
) > 0
labels:
severity: warning
annotations:
summary: "Prometheus may be accessing secrets outside monitoring namespace"
# Alert on unexpected ServiceMonitor creation
- alert: UnexpectedServiceMonitorCreated
expr: |
increase(prometheus_sd_kubernetes_cache_events_total{
role="servicemonitor",
event="add"
}[5m]) > 3
labels:
severity: info
annotations:
summary: "Multiple new ServiceMonitors created in 5 minutes — verify intent"
# Alert on Prometheus config changes
- alert: PrometheusConfigReloaded
expr: |
prometheus_config_last_reload_success_timestamp_seconds >
(time() - 300)
labels:
severity: info
annotations:
summary: "Prometheus config reloaded in last 5 minutes — verify expected change"
Expected Behaviour
| Scenario | Default RBAC | Scoped RBAC |
|---|---|---|
| Prometheus pod compromised via CVE | Attacker reads all cluster secrets | Attacker can only read secrets in specifically permitted namespaces |
| Crafted ServiceMonitor created | Prometheus scrapes attacker-controlled endpoint | Namespace selector + label selector limits which monitors are honoured |
/api/v1/status/config accessed by attacker |
Full scrape config with credentials returned | NetworkPolicy restricts access to Grafana and monitoring namespace only |
| Prometheus list-secrets API call | Returns secrets from all namespaces | ClusterRole has no secrets permission; call returns 403 |
| New namespace needs scrape auth | Requires no change (existing broad access) | Requires explicit Role + RoleBinding in that namespace |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Removing cluster-wide secret RBAC | Eliminates secret exfiltration blast radius | Prometheus cannot automatically pick up new scrape auth secrets in new namespaces | Add namespace-scoped Roles as new namespaces onboard; document the process |
| Namespace selector restriction | Limits ServiceMonitor scope; reduces attack surface | Prometheus doesn’t discover new namespaces automatically | Add monitoring: "enabled" label to new namespaces as part of onboarding |
| ServiceMonitor label selector | Prevents unauthorised monitors | Teams must label their ServiceMonitors correctly | Add label requirement to namespace onboarding checklist |
| NetworkPolicy on Prometheus | Limits access to Prometheus API | Operations teams lose direct kubectl port-forward convenience |
Use Grafana as the primary query interface; restrict direct Prometheus access |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Scoped RBAC breaks scraping of new namespace | New namespace metrics not appearing in Prometheus | Missing metrics in dashboards; Prometheus target shows 403 |
Add namespace-scoped Role + RoleBinding for the new namespace’s scrape credentials |
| ServiceMonitor label not applied | New service not scraped | Missing metrics; Prometheus target list doesn’t show service | Add prometheus: "main" label to ServiceMonitor; verify with kubectl get servicemonitor -l prometheus=main |
| NetworkPolicy blocks Grafana queries | Grafana dashboards show no data | Grafana reports “dial tcp: connection refused” | Verify Grafana pod labels match NetworkPolicy from selector |
| Alert fires on expected config reload | On-call paged for routine Prometheus restart | Alert fires after operator upgrade | Add for: 5m to the alert; suppress during known maintenance windows |
Related Articles
- Prometheus Security Metrics — securing the Prometheus instance itself including TLS and authentication
- Kubernetes RBAC Design Patterns — RBAC design principles applied to monitoring infrastructure
- Kubernetes Secrets Management — protecting secrets from over-privileged service accounts like Prometheus
- Prometheus Cardinality DoS Defence — the denial-of-service class of Prometheus attacks, complementary to the RBAC class
- Kyverno Policy Development — using Kyverno to enforce ServiceMonitor label requirements and RBAC constraints