Synthetic Monitoring as a Security Tool: Blackbox Exporter, Certificate Probes, and Tamper Detection
Problem
Synthetic monitoring is usually framed as an uptime problem: “is the site reachable?” But the Prometheus Blackbox Exporter does much more than ping a URL. Each probe carries security signal that most teams ignore:
- Certificate expiry is a hard cutover from working HTTPS to total failure. The expiry date is visible weeks in advance. Most teams still get surprised.
- TLS downgrade happens when a server that should only accept TLS 1.3 silently accepts TLS 1.0 connections. No alert fires because the connection still succeeds.
- Content tampering — defacement, injected skimming scripts, poisoned API responses — does not affect uptime. Standard uptime checks return HTTP 200 and declare everything healthy.
- DNS hijacking returns a different IP for your domain. Standard HTTP probes follow the redirect and pass. The user is on an attacker-controlled server.
- Security header regression — a deploy removes the
Content-Security-Policyheader. Metrics confirm the endpoint is up. The XSS protection is gone.
These are all detectable by Blackbox Exporter probes configured with security in mind. They are not detected by probes configured with only uptime in mind.
Target systems: Prometheus 2.45+; Blackbox Exporter 0.24+; Alertmanager 0.26+; Grafana 10+.
Threat Model
- Adversary 1 — Certificate expiry via renewal failure: cert-manager fails silently 30 days before expiry. No engineer notices. On day 0 the cert expires, HTTPS fails for all users. Blackbox Exporter would have been firing a warning alert for 28 days.
- Adversary 2 — BGP hijack or DNS compromise redirects traffic: An attacker redirects
payments.example.comto a server they control. The server presents a valid Let’s Encrypt certificate for a different domain. Standard uptime checks pass. A probe checking the expected certificate subject and issuer would have fired immediately. - Adversary 3 — Web skimmer injection: An attacker compromises a CDN configuration and injects a
<script src="https://evil.example/skim.js">tag into every page response. HTTP 200. No change in latency. A probe matching page content against a known-safe pattern fires immediately. - Adversary 4 — Security header stripped by misconfigured reverse proxy: A reverse proxy update removes
Strict-Transport-SecurityandX-Frame-Options. The application is now vulnerable to SSL stripping and clickjacking. No error. No alert without explicit header probing. - Adversary 5 — TLS downgrade via misconfigured load balancer: A load balancer configuration is updated and accidentally re-enables TLS 1.0. Applications continue to work. PCI-DSS requires TLS 1.2+. A probe that asserts the minimum TLS version would fire.
- Access level: Adversaries 1, 3, 4, 5 are reachable via the public internet with no authentication. Adversary 2 requires BGP or DNS control, which is within reach of nation-state and sophisticated criminal actors.
- Objective: Harvest credentials, intercept payments, serve malware, perform SSL-stripping man-in-the-middle.
- Blast radius: Every user of a public-facing endpoint. For payment flows, every transaction in the window between compromise and detection.
Configuration
Step 1: Blackbox Exporter Baseline Setup
Deploy Blackbox Exporter and configure a core set of security-focused modules. The module configuration lives in blackbox.yml, not in prometheus.yml.
# blackbox.yml — core security-focused probe modules.
modules:
# Standard HTTPS check: verifies TLS, follows redirects, expects 2xx.
https_2xx:
prober: http
timeout: 10s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
valid_status_codes: [] # defaults to 2xx
method: GET
follow_redirects: true
fail_if_ssl: false
fail_if_not_ssl: true # MUST be HTTPS — fail if HTTP is served
tls_config:
insecure_skip_verify: false # never skip — defeat the purpose of TLS checks
# Strict TLS 1.2+ check: used for PCI-DSS scope endpoints.
https_tls12_strict:
prober: http
timeout: 10s
http:
fail_if_not_ssl: true
tls_config:
insecure_skip_verify: false
min_version: TLS12 # probe fails if server does not support TLS 1.2+
# TCP with TLS — for non-HTTP services: SMTP, LDAPS, database ports.
tcp_tls:
prober: tcp
timeout: 10s
tcp:
tls: true
tls_config:
insecure_skip_verify: false
# DNS resolution check — asserts a specific IP is returned.
dns_expected:
prober: dns
timeout: 5s
dns:
query_name: "payments.example.com"
query_type: "A"
valid_rcodes:
- NOERROR
validate_answer_rrs:
fail_if_not_matches_regexp:
- "payments\\.example\\.com\\.\\s+\\d+\\s+IN\\s+A\\s+203\\.0\\.113\\."
# Fires if the returned A record is NOT in the 203.0.113.0/24 range.
# Adjust to match your actual production IP range.
Step 2: TLS Certificate Validity Probing
The probe_ssl_earliest_cert_expiry metric is the most valuable security metric Blackbox Exporter emits. It is a Unix timestamp of the soonest-expiring certificate in the chain returned by the probe. Combined with recording rules and alerts, it provides structured early warning.
# prometheus.yml — scrape config for certificate expiry monitoring.
scrape_configs:
- job_name: "blackbox_tls"
metrics_path: /probe
params:
module: [https_2xx]
static_configs:
- targets:
- https://www.example.com
- https://api.example.com
- https://payments.example.com
- https://auth.example.com
relabel_configs:
# Move the target URL into the 'instance' label and pass it as the probe target.
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
The key metric returned per probe:
probe_ssl_earliest_cert_expiry{instance="https://payments.example.com"} 1.7832e+09
probe_tls_version_info{instance="https://payments.example.com", version="TLS 1.3"} 1
probe_http_ssl{instance="https://payments.example.com"} 1
probe_success{instance="https://payments.example.com"} 1
Step 3: Content Integrity Checking
Use fail_if_body_matches_regexp to detect injected content and fail_if_body_not_matches_regexp to detect content removal. These two directions cover different attack scenarios.
# blackbox.yml — content integrity modules.
modules:
# Defacement / injection detection: probe fails if these patterns appear in the body.
https_no_injection:
prober: http
timeout: 15s
http:
fail_if_not_ssl: true
tls_config:
insecure_skip_verify: false
fail_if_body_matches_regexp:
# Skimmer script patterns: adjust these to known-bad CDN domains.
- "src=[\"']https?://(?!cdn\\.example\\.com|trusted-cdn\\.com)[^\"']*\\.js[\"']"
# Generic eval-based obfuscation common in skimmers.
- "eval\\(atob\\("
# Iframe injection.
- "<iframe[^>]+src=[\"']https?://(?!www\\.example\\.com)"
# Content presence check: probe fails if expected content is MISSING.
# Use this to detect complete replacement of page content (full defacement).
https_content_present:
prober: http
timeout: 15s
http:
fail_if_not_ssl: true
tls_config:
insecure_skip_verify: false
fail_if_body_not_matches_regexp:
# A string that MUST be in every response — e.g., your canonical brand name.
- "Example Corp"
# Or a meta tag, a specific footer string, a copyright notice.
- "© 2026 Example Corp"
Content integrity probes are most useful against static pages, marketing sites, and login pages — the surfaces attackers prefer for credential harvesting.
Step 4: HTTP Security Header Monitoring
HTTP probers can assert that specific headers are present and contain expected values. This catches security header regressions introduced by deploys or proxy changes.
# blackbox.yml — security header validation module.
modules:
https_security_headers:
prober: http
timeout: 10s
http:
fail_if_not_ssl: true
tls_config:
insecure_skip_verify: false
# fail_if_header_not_matches requires Blackbox Exporter 0.23+.
fail_if_header_not_matches:
- header: Strict-Transport-Security
regexp: "max-age=([6-9][0-9]{6}|[1-9][0-9]{7,})"
# Asserts HSTS max-age >= 6 months (15768000 seconds).
# Fires if HSTS is absent or max-age is too short.
- header: X-Frame-Options
regexp: "(?i)(DENY|SAMEORIGIN)"
- header: X-Content-Type-Options
regexp: "(?i)nosniff"
fail_if_header_matches:
- header: Server
regexp: "(?i)(Apache/[12]|nginx/[01]\\.|IIS/[0-9])"
# Fires if the Server header reveals a known-old version.
# Adjust regex to match versions you consider unacceptably old.
Run this module against your primary web properties on the same scrape interval as your uptime checks — every 60 seconds. A deploy that removes HSTS will appear in your dashboards within a minute.
Note: Content-Security-Policy is too variable to check with a simple regexp. Use a dedicated CSP evaluation tool for deep CSP analysis. The header probe is best for binary presence/absence and simple value assertions.
Step 5: DNS Hijack Detection
DNS hijacking changes the A or AAAA records for your domains to point to an attacker-controlled server. The attack is especially effective against high-value targets like payment endpoints and login pages. Blackbox Exporter’s DNS prober can validate the returned answers against expected IP ranges.
# blackbox.yml — DNS hijack detection modules.
modules:
dns_payments_a_record:
prober: dns
timeout: 5s
dns:
preferred_ip_protocol: "ip4"
query_name: "payments.example.com"
query_type: "A"
valid_rcodes:
- NOERROR
validate_answer_rrs:
fail_if_not_matches_regexp:
# Expected IP range. Update to match your load balancer IPs or CDN range.
- "payments\\.example\\.com\\.\\s+\\d+\\s+IN\\s+A\\s+(203\\.0\\.113\\.1|203\\.0\\.113\\.2)$"
fail_if_matches_regexp:
# Explicit deny: any RFC1918 / loopback address in the response is suspicious.
- "payments\\.example\\.com\\.\\s+\\d+\\s+IN\\s+A\\s+(10\\.|172\\.(1[6-9]|2[0-9]|3[01])\\.|192\\.168\\.|127\\.)"
# prometheus.yml — scrape config for DNS hijack probes.
- job_name: "blackbox_dns_hijack"
metrics_path: /probe
params:
module: [dns_payments_a_record]
static_configs:
- targets:
# Probe from multiple resolvers to detect resolver-specific hijacking.
- 8.8.8.8 # Google Public DNS
- 1.1.1.1 # Cloudflare
- 208.67.222.222 # OpenDNS
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
Probing multiple resolvers catches hijacks that only affect specific resolver paths — which is the case for BGP prefix hijacks and compromised DNS resolvers at the ISP level.
Step 6: Alertmanager Rules for Certificate Expiry
# prometheus-rules.yml — certificate expiry and probe security alerts.
groups:
- name: synthetic_security
interval: 60s
rules:
# Certificate expiry: critical page at 14 days, warning at 28 days.
- alert: CertificateExpiryCritical
expr: |
(probe_ssl_earliest_cert_expiry - time()) / 86400 < 14
for: 5m
labels:
severity: critical
team: platform
annotations:
summary: "Certificate expiring in less than 14 days: {{ $labels.instance }}"
description: |
The TLS certificate for {{ $labels.instance }} expires in
{{ $value | printf "%.0f" }} days. Renewal must happen before day 0.
runbook: "https://runbooks.example.com/cert-expiry"
- alert: CertificateExpiryWarning
expr: |
(probe_ssl_earliest_cert_expiry - time()) / 86400 < 28
for: 5m
labels:
severity: warning
team: platform
annotations:
summary: "Certificate expiring in less than 28 days: {{ $labels.instance }}"
description: |
The TLS certificate for {{ $labels.instance }} expires in
{{ $value | printf "%.0f" }} days.
# TLS version regression: probe succeeds but TLS < 1.2 is in use.
- alert: TLSVersionInsecure
expr: |
probe_tls_version_info{version=~"TLS 1\\.0|TLS 1\\.1|SSL.*"} == 1
for: 5m
labels:
severity: critical
team: security
annotations:
summary: "Insecure TLS version in use: {{ $labels.instance }}"
description: |
{{ $labels.instance }} accepted a connection using {{ $labels.version }}.
PCI-DSS requires TLS 1.2+. Disable legacy TLS on the load balancer.
# Content integrity or header probe failure.
- alert: SecurityProbeFailure
expr: |
probe_success{job=~"blackbox_security_headers|blackbox_content_integrity"} == 0
for: 2m
labels:
severity: critical
team: security
annotations:
summary: "Security probe failure: {{ $labels.instance }}"
description: |
The security probe for {{ $labels.instance }} is failing.
This may indicate a missing security header, injected content,
or a content integrity violation. Investigate immediately.
# DNS hijack detection.
- alert: DNSHijackDetected
expr: |
probe_success{job="blackbox_dns_hijack"} == 0
for: 1m
labels:
severity: critical
team: security
annotations:
summary: "DNS answer validation failed: possible hijack on {{ $labels.instance }}"
description: |
The DNS probe for payments.example.com via resolver {{ $labels.instance }}
returned an unexpected answer. Verify DNS records immediately.
Step 7: Prometheus Recording Rules for Synthetic SLOs
Synthetic probes provide clean inputs for availability SLOs tracked over burn windows.
# prometheus-rules.yml — synthetic SLO recording rules.
groups:
- name: synthetic_slo_recording
interval: 30s
rules:
# 5-minute probe success rate per target.
- record: job_instance:probe_success:rate5m
expr: |
avg_over_time(probe_success[5m])
# Certificate days remaining — pre-computed for dashboards.
- record: instance:cert_days_remaining:gauge
expr: |
(probe_ssl_earliest_cert_expiry - time()) / 86400
# 30-day synthetic availability per job.
- record: job:probe_success:availability30d
expr: |
avg_over_time(probe_success[30d])
# Probe duration p99 over 1 hour — latency SLO signal.
- record: job_instance:probe_duration_seconds:p99_1h
expr: |
histogram_quantile(0.99,
rate(probe_duration_seconds_bucket[1h])
)
Step 8: Securing Blackbox Exporter Itself
Blackbox Exporter has a significant SSRF vulnerability surface that is frequently overlooked. The /probe endpoint accepts a target query parameter — and by default it probes whatever URL you pass it.
The SSRF risk:
# An attacker who can reach blackbox-exporter:9115 can probe internal targets:
GET /probe?target=http://169.254.169.254/latest/meta-data/&module=http_2xx
GET /probe?target=http://internal-redis:6379&module=tcp_connect
GET /probe?target=http://kubernetes.default.svc.cluster.local/api/v1/secrets&module=http_2xx
If Blackbox Exporter is reachable from within the cluster and the Kubernetes service account has any permissions, this becomes a credentials-via-probe path.
Mitigation 1: Use a static target list in Prometheus, never accept free-form targets.
The scrape configs in Step 2 and Step 5 use static_configs with relabel_configs to pass the target as a parameter. This is correct. The risk materialises when engineers add a debug endpoint or expose the service externally. Lock this down at the network layer.
# kubernetes NetworkPolicy: restrict who can reach blackbox-exporter.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: blackbox-exporter-ingress
namespace: monitoring
spec:
podSelector:
matchLabels:
app: blackbox-exporter
policyTypes:
- Ingress
ingress:
# Only allow traffic from Prometheus pods in the monitoring namespace.
- from:
- podSelector:
matchLabels:
app: prometheus
namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: monitoring
ports:
- port: 9115
protocol: TCP
# No ingress from any other source — no developer tooling, no debugging.
Mitigation 2: Disable the /probe endpoint’s free-form target parameter at the proxy layer.
If you run Blackbox Exporter behind a reverse proxy (NGINX, Envoy, Traefik), add a rule that only allows probe requests with targets from an approved allowlist:
# nginx snippet: only allow known targets to the probe endpoint.
location /probe {
# Reject requests that do not come from the Prometheus server.
allow 10.0.1.15; # Prometheus pod IP (or use a CIDR for the monitoring namespace).
deny all;
}
Mitigation 3: Run Blackbox Exporter without host network access.
# blackbox-exporter Deployment: no privileged access, no host network.
spec:
template:
spec:
hostNetwork: false
hostPID: false
hostIPC: false
securityContext:
runAsNonRoot: true
runAsUser: 65534 # nobody
seccompProfile:
type: RuntimeDefault
containers:
- name: blackbox-exporter
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
Mitigation 4: Enable Blackbox Exporter’s built-in web TLS and basic auth.
# web-config.yml for blackbox-exporter — TLS + basic auth.
tls_server_config:
cert_file: /tls/tls.crt
key_file: /tls/tls.key
min_version: TLS12
basic_auth_users:
# bcrypt hash of the Prometheus scrape password.
prometheus: $2y$12$abc...
Pass this config with --web.config.file=/etc/blackbox/web-config.yml. Prometheus scrape configs must then include the matching basic_auth or tls_config stanza.
Step 9: Real User Monitoring with Grafana Faro
Synthetic probes test from a fixed vantage point. Grafana Faro instruments the browser and reports real user experience — including security-relevant signals that synthetic probes cannot see:
- Mixed content warnings: A page served over HTTPS loading a resource over HTTP. Faro captures these as browser console errors.
- CSP violations: If your CSP is configured in report-only mode, browsers report violations to Faro’s collector.
- Third-party script failures: A skimmer that loads from a CDN not in your script-src will produce a CSP violation if your policy is correct — and Faro will capture it.
// faro-init.js — initialise Grafana Faro in your frontend.
import { initializeFaro, getWebInstrumentations } from '@grafana/faro-web-sdk';
initializeFaro({
url: 'https://faro-collector.example.com/collect',
app: {
name: 'web-frontend',
version: '1.0.0',
environment: 'production',
},
instrumentations: [
...getWebInstrumentations({
captureConsole: true, // captures mixed content and CSP warnings
captureConsoleDisabledLevels: [],
}),
],
});
Route Faro’s CSP violation events to an Alertmanager webhook receiver to get real-time notification when a CSP violation is detected in a production browser session.
Verification
After deploying the configuration:
- Confirm probes are appearing in Prometheus:
probe_success{job="blackbox_tls"}should return 1 for each configured target. - Check certificate expiry metrics:
instance:cert_days_remaining:gaugeshould show values greater than 14 for all targets. - Test content integrity probes: temporarily add a known-bad string to a test page and confirm
probe_successdrops to 0 within one scrape interval. - Validate DNS probes: query
probe_success{job="blackbox_dns_hijack"}— all resolvers should return 1. - Test SSRF mitigation: from a pod other than Prometheus, attempt
curl http://blackbox-exporter:9115/probe?target=http://kubernetes.default.svc.cluster.local/&module=http_2xx— the NetworkPolicy should drop the connection. - Review Alertmanager routing: confirm that
severity: criticalalerts forSecurityProbeFailureandDNSHijackDetectedroute to your on-call channel, not only to email.
Summary
Blackbox Exporter is already deployed in most Prometheus environments for uptime monitoring. The security uplift requires no new tooling — only additional modules in blackbox.yml, targeted scrape jobs in prometheus.yml, and alert rules that treat probe failures as security events rather than availability events.
The highest-impact additions in order:
probe_ssl_earliest_cert_expiryalerts at 28 days and 14 days — eliminates the most common cause of self-inflicted HTTPS outages.fail_if_body_matches_regexpfor known injection patterns — catches web skimmers and defacement within one scrape interval.- DNS answer validation against expected IP ranges — catches BGP hijacks and DNS compromises before users land on the attacker’s server.
fail_if_header_not_matchesfor HSTS and X-Frame-Options — catches security header regressions introduced by deploys.- NetworkPolicy restricting
/probeaccess to Prometheus only — removes the SSRF vector before someone exploits it.
None of these require application changes. They operate entirely at the probe layer, making them retrofittable onto existing services without coordination with application teams.