DNS Resolver Infrastructure Hardening: Recursive Resolvers, DNSSEC, DoT, and Split-Horizon DNS
The Problem
Your internal recursive resolver is the most trusted machine in your network that your users never think about. Every hostname — internal services, cloud APIs, SaaS endpoints, authentication providers — passes through it. That makes it a high-value target for attackers and a high-leverage control point for defenders.
The threats are real and well-documented. The Kaminsky attack (2008) demonstrated that an unpatched resolver could be cache-poisoned in seconds, redirecting internal users to attacker-controlled IPs for any domain. DNS rebinding attacks abuse the same-origin policy gap between DNS TTLs and browser security boundaries, allowing a malicious web page to pivot into internal network addresses. Attackers who compromise an internal host often leverage the resolver for lateral movement: querying internal service discovery names, enumerating SRV records, or using the resolver as a covert DNS tunnel for exfiltration.
Most organisations run their internal recursive resolver as a default BIND or Unbound install — recursor open to any RFC1918 client, version string exposed, no DNSSEC validation, no encrypted forwarding, split-horizon configured with a comment saying “TODO: review”. This article closes those gaps systematically.
Scope: BIND 9.16+ and Unbound 1.13+ on Linux. CoreDNS for Kubernetes. Specific hardening controls with production-ready configuration snippets.
Internal Recursive Resolver Threats
Before configuration, establish the threat model:
Cache poisoning (Kaminsky attack): An attacker sends a flood of forged DNS responses to the resolver, racing to inject a poisoned record before the legitimate response arrives. The classic mitigation is source port randomisation — the attacker must guess both the 16-bit transaction ID and the source port, making brute-force statistically infeasible. Without it, a resolver can be poisoned in under 10 seconds on a fast network.
DNS rebinding: A malicious domain returns a public IP on the first lookup (passing same-origin checks), then quickly changes its DNS to return an RFC1918 address. The browser caches the original origin but now makes requests to 192.168.1.1, bypassing network firewalls entirely. The resolver is the correct enforcement point: it should refuse to return RFC1918 addresses in responses to public domain queries.
Resolver-based lateral movement: An attacker with a foothold on one internal host can query the internal resolver for hostnames that are only visible internally — service discovery via SRV records, internal hostnames leaked from split-horizon zones, PTR records that reveal naming conventions. A hardened resolver limits what unprivileged clients can enumerate.
DNS tunnelling exfiltration: Tools like dnscat2 and iodine encode data in DNS query and response payloads, using the resolver as a carrier. High query rates to a single external domain, unusually long hostnames, or TXT record abuse are all detectable from query logs.
BIND 9 Hardening
A default BIND 9 installation as a recursive resolver exposes several configuration risks. The following named.conf.options block addresses them:
options {
directory "/var/cache/bind";
// Restrict recursion to internal clients only
// Never allow open recursion — it enables DNS amplification attacks
allow-recursion { 10.0.0.0/8; 172.16.0.0/12; 192.168.0.0/16; 127.0.0.1; ::1; };
allow-query { 10.0.0.0/8; 172.16.0.0/12; 192.168.0.0/16; 127.0.0.1; ::1; };
allow-query-cache { 10.0.0.0/8; 172.16.0.0/12; 192.168.0.0/16; 127.0.0.1; ::1; };
// Only listen on internal interfaces
listen-on { 10.0.1.5; 127.0.0.1; };
listen-on-v6 { ::1; };
// Hide version string — never expose to external queries
version "none";
hostname "none";
server-id "none";
// Response Rate Limiting — prevents DNS amplification abuse
// Applies even to internal resolvers that might be abused by a compromised host
rate-limit {
responses-per-second 15;
window 5;
log-only no;
};
// Return minimal responses — don't include extra section glue for unsolicited names
minimal-responses yes;
// Source port randomisation (should be enabled by default, verify it is)
use-v4-udp-ports { range 1024 65535; };
avoid-v4-udp-ports { };
// DNSSEC validation
dnssec-validation auto;
// Do not allow zone transfers except to explicit secondaries
allow-transfer { none; };
// Limit recursion depth to reduce NXDOMAIN amplification
max-recursion-depth 12;
max-recursion-queries 75;
// Fetch limits — prevent a single resolver from being used as a DDoS reflector
fetch-quota-params 100 0.1 0.3 0.5;
// Disable CHAOS class queries (used for version probing)
// Already handled by version "none" but belt-and-suspenders
deny-answer-addresses { 100.64.0.0/10; };
// Stale-answer-enable as fallback during upstream outages
stale-answer-enable yes;
stale-answer-ttl 30;
stale-cache-enable yes;
stale-answer-client-timeout 2000;
};
Verify the configuration with:
named-checkconf /etc/bind/named.conf
# Check that recursion is not open
dig +short version.bind TXT CHAOS @127.0.0.1
# Should return nothing or "none"
dig @127.0.0.1 example.com A
# Verify external queries resolve correctly for internal clients
Unbound Hardening
Unbound is the preferred resolver for many security-focused deployments. Its configuration is more granular than BIND for recursive resolver hardening:
server:
# Interface binding — restrict to internal interfaces
interface: 10.0.1.5
interface: 127.0.0.1
port: 53
# Access control — deny everything, then allow internal ranges
access-control: 0.0.0.0/0 refuse
access-control: 127.0.0.0/8 allow
access-control: 10.0.0.0/8 allow
access-control: 172.16.0.0/12 allow
access-control: 192.168.0.0/16 allow
access-control: ::1 allow
# DNSSEC validation
auto-trust-anchor-file: "/var/lib/unbound/root.key"
val-clean-additional: yes
val-permissive-mode: no # Hard fail on DNSSEC validation errors
# DNS rebinding prevention — refuse RFC1918 addresses in responses to public names
private-address: 10.0.0.0/8
private-address: 172.16.0.0/12
private-address: 192.168.0.0/16
private-address: 100.64.0.0/10
private-address: 169.254.0.0/16
private-address: fd00::/8
private-address: fe80::/10
# Allow known internal domains to return RFC1918 (split-horizon)
private-domain: "internal.example.com"
private-domain: "corp.example.com"
# QNAME minimisation — reduces query privacy leakage to root/TLD servers
# Only sends the minimum necessary labels upstream
qname-minimisation: yes
qname-minimisation-strict: no # Fall back if upstream breaks; set yes for strict environments
# Harden against cache poisoning and injection
harden-glue: yes # Only accept glue for delegated zones
harden-dnssec-stripped: yes # Reject unsigned responses for DNSSEC-signed zones
harden-below-nxdomain: yes # Do not return data for subdomains of NXDOMAIN
harden-referral-path: yes # Validate the entire referral path
harden-algo-downgrade: yes # Reject algorithm downgrades in DNSSEC
# Cache poisoning mitigations
use-caps-for-id: yes # 0x20 bit randomisation (mixed-case queries)
# Hide version string
hide-version: yes
hide-identity: yes
# Prefetch popular records before TTL expiry — reduces latency
prefetch: yes
prefetch-key: yes
# Aggressive NSEC — use NSEC/NSEC3 records to synthesise NXDOMAIN faster
aggressive-nsec: yes
# Rate limiting
ratelimit: 1000 # queries per second total
ratelimit-for-domain: 100 # per upstream domain
# Logging for security monitoring
use-syslog: yes
log-queries: yes # Enable in production for SIEM ingestion
log-replies: yes
log-tag-queryreply: yes
# dnstap for structured query logging
dnstap-enable: yes
dnstap-socket-path: "/run/unbound/dnstap.sock"
dnstap-send-identity: yes
dnstap-send-version: yes
dnstap-log-resolver-query-messages: yes
dnstap-log-resolver-response-messages: yes
dnstap-log-client-query-messages: yes
dnstap-log-client-response-messages: yes
# Outgoing port randomisation
outgoing-num-tcp: 10
num-queries-per-thread: 1024
jostle-timeout: 200
remote-control:
control-enable: yes
control-interface: 127.0.0.1
control-port: 8953
# Use TLS for remote control
control-use-cert: yes
server-key-file: "/etc/unbound/unbound_server.key"
server-cert-file: "/etc/unbound/unbound_server.pem"
control-key-file: "/etc/unbound/unbound_control.key"
control-cert-file: "/etc/unbound/unbound_control.pem"
Generate control TLS certificates:
unbound-control-setup
# Verify configuration syntax
unbound-checkconf /etc/unbound/unbound.conf
# Check DNSSEC trust anchor is loaded
unbound-control status | grep trust
DNSSEC Validation at the Resolver
DNSSEC validation at the recursive resolver is the definitive defence against cache poisoning. The resolver cryptographically verifies that each response was signed by the authoritative zone owner. A forged response — even one with a correct transaction ID and source port — fails signature verification and is discarded.
Bootstrapping trust anchors in Unbound:
# Fetch and install the root trust anchor
unbound-anchor -a /var/lib/unbound/root.key
chown unbound:unbound /var/lib/unbound/root.key
# Verify DNSSEC validation is working
unbound-host -D -v dnssec-failed.org
# Should return: validation failure (the test domain deliberately fails)
dig +dnssec +multi @127.0.0.1 cloudflare.com A
# Look for 'ad' flag in the header (Authenticated Data)
Handling validation failures: When a zone is incorrectly signed, clients will receive SERVFAIL for all queries to that zone. This is correct behaviour — a SERVFAIL from DNSSEC is not the same as a network failure. Document your escalation procedure: validation failures require the zone owner to correct their signing, not your resolver to disable validation.
BIND 9 DNSSEC validation:
options {
dnssec-validation auto;
# auto loads the built-in root trust anchor from bind.keys
# Use explicit trust anchors for internal zones:
};
# For an internally-signed zone, add the KSK public key:
trusted-keys {
"internal.example.com." 256 3 8 "AwEAAb...base64...";
};
DNS over TLS for Internal Resolvers
Encrypting DNS traffic between clients and the internal resolver prevents passive interception and manipulation of DNS queries on internal network segments — relevant when networks are shared, untrusted endpoints exist, or compliance requires in-transit encryption for all traffic.
Running Unbound with DoT on port 853:
server:
interface: 10.0.1.5@853
tls-service-key: "/etc/unbound/tls/resolver.key"
tls-service-pem: "/etc/unbound/tls/resolver.crt"
# Include the full chain if using an internal CA
# tls-cert-bundle includes trusted CAs for outgoing TLS (DoT to upstream)
# Forward queries upstream via DoT
forward-zone:
name: "."
forward-tls-upstream: yes
forward-addr: 9.9.9.9@853#dns.quad9.net
forward-addr: 149.112.112.112@853#dns.quad9.net
# Pin by SPKI hash if you want to detect upstream cert rotation
Generate an internal CA-signed cert for the resolver:
# Generate key and CSR
openssl genrsa -out /etc/unbound/tls/resolver.key 4096
openssl req -new -key /etc/unbound/tls/resolver.key \
-subj "/CN=dns.internal.example.com" \
-addext "subjectAltName=DNS:dns.internal.example.com,IP:10.0.1.5" \
-out /etc/unbound/tls/resolver.csr
# Sign with internal CA
openssl x509 -req -in /etc/unbound/tls/resolver.csr \
-CA /etc/pki/internal-ca/ca.crt \
-CAkey /etc/pki/internal-ca/ca.key \
-CAcreateserial \
-days 825 \
-extensions v3_req \
-out /etc/unbound/tls/resolver.crt
chmod 640 /etc/unbound/tls/resolver.key
chown root:unbound /etc/unbound/tls/resolver.key
Configure systemd-resolved on clients to use DoT:
# /etc/systemd/resolved.conf
[Resolve]
DNS=10.0.1.5
DNSOverTLS=yes
DNSSEC=yes
Domains=~internal.example.com ~corp.example.com
Distribute the internal CA certificate to clients:
# On each Linux client
cp internal-ca.crt /usr/local/share/ca-certificates/internal-ca.crt
update-ca-certificates
# Verify DoT is working
resolvectl status
# Look for: DNS over TLS setting: yes
# Current DNS over TLS: yes
DNS Rebinding Prevention
DNS rebinding is a class of attack where an attacker-controlled domain first resolves to a public IP (passing browser same-origin checks), then switches its DNS to an internal RFC1918 address. The browser reuses the existing origin and sends requests to internal infrastructure.
The resolver is the correct control point. Unbound’s private-address directive makes the resolver refuse to return RFC1918 addresses from publicly-resolvable names:
server:
# Block all RFC1918 and link-local returns from public name resolution
private-address: 10.0.0.0/8
private-address: 172.16.0.0/12
private-address: 192.168.0.0/16
private-address: 127.0.0.0/8
private-address: 169.254.0.0/16
private-address: ::ffff:0:0/96
private-address: 100.64.0.0/10
private-address: fd00::/8
# Whitelist legitimate internal domains that intentionally return RFC1918
# Without this, your internal zone lookups would also be blocked
private-domain: "internal.example.com"
private-domain: "corp.example.com"
private-domain: "10.in-addr.arpa"
private-domain: "168.192.in-addr.arpa"
Test the protection:
# A public domain that returns RFC1918 should be blocked
# (this simulates a rebinding attempt)
dig @127.0.0.1 rebind.it A
# Should return SERVFAIL, not the RFC1918 address
BIND 9 equivalent using RPZ (Response Policy Zones):
rpz {
zone "rpz.internal" {
type master;
file "rpz.internal.db";
};
};
With zone records denying RFC1918 responses for non-private domains — but Unbound’s built-in private-address is simpler and less error-prone.
Split-Horizon DNS
Split-horizon DNS (also called split-brain DNS) serves different answers to internal versus external clients for the same domain name. Internal clients get the internal IP for app.example.com; external clients get the load balancer IP. This is operationally convenient but introduces security considerations.
BIND 9 views:
# named.conf
acl "internal" {
10.0.0.0/8;
172.16.0.0/12;
192.168.0.0/16;
};
view "internal" {
match-clients { internal; };
recursion yes;
allow-recursion { internal; };
zone "example.com" {
type master;
file "/etc/bind/zones/example.com.internal";
};
# Internal view includes all public zones via recursion
# plus override zones for internal services
};
view "external" {
match-clients { any; };
recursion no; // Never allow recursion on the external-facing view
allow-recursion { none; };
zone "example.com" {
type master;
file "/etc/bind/zones/example.com.external";
};
};
Security considerations for split-horizon:
Information leakage is the primary risk. If the internal zone file for example.com includes hostnames like vpn.example.com, admin.example.com, jira.example.com, and those hostnames are reachable from outside, an attacker who can query the internal resolver (e.g., from a compromised host or via SSRF) learns your full internal service map.
Mitigations:
- Audit internal zone files for hostnames that reveal architecture (e.g.,
k8s-master-01.internal.example.com). - Use separate namespaces:
example.comfor public,corp.example.comfor internal-only services. - Never serve internal zone data on the external-facing authoritative server.
- Use TSIG (Transaction Signature) to authenticate zone transfers between internal primaries and secondaries.
Separate Unbound instances for strict isolation:
For environments where a single BIND views misconfiguration could expose internal data, run two physically separate Unbound instances:
# Internal resolver: 10.0.1.5:53
# Handles recursion + internal zone stubs
# Only reachable from RFC1918 clients
# External resolver (or none at all)
# Handles authoritative responses for public zones only
# Does not recurse
Use stub-zone in the internal Unbound to forward internal domain queries to an internal authoritative:
stub-zone:
name: "corp.example.com"
stub-addr: 10.0.1.10@53 # Internal authoritative nameserver
stub-prime: no
Cache Poisoning Mitigations Beyond DNSSEC
DNSSEC is the definitive fix for cache poisoning, but not all domains are DNSSEC-signed. Layered mitigations:
Source port randomisation: The Kaminsky attack requires guessing both the 16-bit transaction ID and the ephemeral source port. With a 16-bit port space (~64,000 values), the combined entropy makes brute-force impractical. Both BIND and Unbound do this by default — verify it has not been disabled by checking that avoid-v4-udp-ports is not restricting too many ports:
# Verify Unbound is using randomised ports
unbound-control stats | grep num.query.tcp
# Most queries should be UDP; TCP spike can indicate poisoning attempts
ss -u -l -n | grep :53
# Should show multiple source ports in use
0x20 bit randomisation (use-caps-for-id): Unbound randomises the capitalisation of query labels (e.g., eXaMpLe.cOm) and verifies the authoritative server echoes the same capitalisation in its response. A forged response that doesn’t match the original mixed-case query is discarded. This adds approximately 20 bits of additional randomness for a typical domain name.
# Unbound
server:
use-caps-for-id: yes
Note: some authoritative servers do not correctly echo 0x20 labels. Unbound falls back gracefully, but you may see rare resolution failures for broken implementations.
Minimising cache poisoning blast radius: Set aggressive TTL minimums to limit how long a poisoned entry persists, without setting them so low that your resolver hammers upstream on every query:
# Unbound
server:
cache-min-ttl: 60 # Minimum 60 seconds regardless of authoritative TTL
cache-max-ttl: 86400 # Maximum 24 hours
cache-max-negative-ttl: 900
DNS Query Logging for Security
Resolver query logs are a primary signal source for detecting DNS tunnelling, C2 beacon activity, and internal reconnaissance. Unstructured syslog is insufficient at scale; use dnstap for structured binary logging.
Unbound dnstap to file via fstrm_capture:
# Install fstrm tools
apt install fstrm-bin
# Start fstrm capture daemon reading from Unbound's dnstap socket
fstrm_capture -t protobuf:dnstap.Dnstap \
-r /run/unbound/dnstap.sock \
-w /var/log/dns/dnstap.fstrm &
# Convert to JSON for SIEM ingestion
dnstap-ldns -j < /var/log/dns/dnstap.fstrm | jq .
BIND 9 dnstap:
options {
dnstap { client; resolver; auth; forwarder; };
dnstap-output file "/var/log/named/dnstap.bin" versions 10 size 50m;
dnstap-identity "resolver-01";
dnstap-version yes;
};
Detecting DNS tunnelling from query logs: DNS tunnelling tools encode data as subdomains — a typical exfiltration query looks like aGVsbG8gd29ybGQ.exfil.attacker.com. Detection heuristics:
# Find unusually long hostnames — base64/hex encoded data
# Normal queries rarely exceed 40 characters in the queried name
awk -F'"' '{ if (length($2) > 50) print }' /var/log/named/queries.log | \
sort | uniq -c | sort -rn | head -20
# Find high query rates to a single parent domain
# DNS tunnels generate dozens of queries per second
dnstap-ldns -j < /var/log/dns/dnstap.fstrm | \
jq -r '.message.query_name' | \
awk -F. '{print $(NF-1)"."$NF}' | \
sort | uniq -c | sort -rn | head -20
# Find TXT and NULL record queries (often used for data return channel)
grep -E '"type":"TXT"|"type":"NULL"' /var/log/dns/queries.json | \
jq -r '.qname' | sort | uniq -c | sort -rn
Ship dnstap output to your SIEM (Elasticsearch/Splunk) via Filebeat or a custom Logstash pipeline. Index on qname, rcode, qtype, and client_address for effective threat hunting queries.
Kubernetes CoreDNS Security Hardening
CoreDNS is the default DNS resolver for Kubernetes. Default configurations have several security gaps:
Restrict external forwarding to a controlled upstream:
# coredns ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
# Forward only to known internal resolvers, not 8.8.8.8
forward . 10.0.1.5 10.0.1.6 {
max_concurrent 1000
expire 10s
health_check 5s
# Use TLS if internal resolvers support DoT
# tls /etc/coredns/tls.crt /etc/coredns/tls.key /etc/coredns/ca.crt
}
cache 30 {
success 9984 30
denial 9984 5
}
loop
reload
loadbalance
# Log queries for security monitoring
log . {
class all
}
}
Block exfiltration via DNS from pods: Use a CoreDNS policy plugin or NetworkPolicy to prevent pods from making arbitrary DNS queries to external resolvers, bypassing the cluster resolver:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-external-dns
namespace: production
spec:
podSelector: {}
policyTypes:
- Egress
egress:
# Allow DNS only to CoreDNS in kube-system
- ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
# Allow all other egress (adjust to least-privilege)
- ports:
- port: 443
protocol: TCP
- port: 80
protocol: TCP
Harden CoreDNS RBAC: The CoreDNS ServiceAccount needs only read access to endpoints, services, and pods — not write access or access to secrets:
# Audit CoreDNS RBAC permissions
kubectl describe clusterrole system:coredns
# Verify no write verbs (create, update, patch, delete) are granted
kubectl get clusterrolebinding system:coredns -o yaml | \
grep -A5 subjects
Enable CoreDNS metrics for anomaly detection:
# CoreDNS exposes Prometheus metrics at :9153/metrics
# Key metrics to alert on:
# coredns_dns_requests_total{rcode="SERVFAIL"} — spike indicates DNSSEC failures or resolver issues
# coredns_forward_requests_total — unexpected external forwarding
# coredns_cache_hits_total vs coredns_cache_misses_total — low hit rate indicates cache bypass attempts
kubectl port-forward -n kube-system svc/kube-dns 9153:9153
curl -s http://localhost:9153/metrics | grep coredns_dns
Operational Checklist
Run this against your resolver deployment:
#!/bin/bash
# DNS Resolver Security Audit
RESOLVER="127.0.0.1"
echo "=== Open Recursion Test ==="
# Should fail (REFUSED or timeout) from any IP not in allow-recursion
dig +time=3 @${RESOLVER} example.com A 2>&1 | grep -E "REFUSED|NOERROR|SERVFAIL"
echo "=== Version String Exposure ==="
dig @${RESOLVER} version.bind TXT CHAOS +short
# Should return empty or "none"
echo "=== DNSSEC Validation ==="
dig @${RESOLVER} dnssec-failed.org A +dnssec 2>&1 | grep -E "SERVFAIL|NXDOMAIN|ad"
# Should return SERVFAIL (validation failure for intentionally-broken zone)
dig @${RESOLVER} cloudflare.com A +dnssec 2>&1 | grep "^;; flags"
# Should include 'ad' flag
echo "=== DNS Rebinding Protection ==="
# Query a domain known to return RFC1918 (test with a lab domain or rebind.it)
dig @${RESOLVER} rebind.it A +short 2>&1
# Should return SERVFAIL, not an RFC1918 address
echo "=== Zone Transfer Restriction ==="
dig @${RESOLVER} example.com AXFR 2>&1 | grep -E "REFUSED|Transfer"
# Should be REFUSED
echo "=== Query Logging ==="
ls -la /var/log/dns/ 2>/dev/null || \
journalctl -u unbound --since "1 minute ago" | grep -c "query"
Summary
DNS resolver infrastructure hardening is a force-multiplier control: a well-hardened resolver blocks rebinding attacks, resists cache poisoning for all clients simultaneously, validates DNSSEC for every query, and produces the query logs needed to detect tunnelling and C2 activity. The configuration changes are low-risk and high-leverage — most can be applied to a running resolver without service interruption by testing with unbound-checkconf or named-checkconf before reload.
Priority order for brownfield deployments:
- Restrict recursion to internal clients only (
allow-recursion,access-control) — this closes the open resolver risk immediately. - Enable DNSSEC validation — the most impactful single change against cache poisoning.
- Enable
private-addressin Unbound for DNS rebinding protection. - Enable
use-caps-for-idfor 0x20 randomisation. - Configure dnstap and ship to SIEM.
- Migrate clients to DoT once the resolver cert is in place.
- Audit split-horizon zone files for information leakage.