kube-bench: CIS Kubernetes Benchmark Automation and Remediation
Problem
Kubernetes ships with many components that are insecure by default: anonymous authentication enabled on the API server, insecure ports open on the scheduler and controller manager, audit logging disabled, kubelet read-only port open. The CIS Kubernetes Benchmark documents the correct configuration for all of these. Most clusters fail dozens of checks simply because nobody ran the benchmark after initial setup.
The problems with manual CIS benchmark audits:
- Point-in-time assessments. A manual audit captures the cluster state at one moment. Node configuration drift, new worker nodes provisioned from an unhardened image, or a cloud provider upgrade that resets kubelet flags create compliance gaps between audits.
- No CI/CD integration. Changes that cause benchmark regressions (updating kube-apiserver flags, modifying kubelet configuration) are merged without a compliance gate. The next audit discovers the regression weeks or months later.
- Partial coverage. Manual audits often focus on control plane components and miss worker nodes. A cluster with a hardened API server and misconfigured kubelets passes a partial audit but remains exploitable.
- No distinction between managed and self-managed controls. On EKS, GKE, or AKS, some controls are the cloud provider’s responsibility and cannot be configured by the operator. Without filtering, the raw benchmark output shows hundreds of failures that cannot be remediated.
- Remediation guidance not tracked. A finding with no owner is never fixed. Without a tracking workflow, benchmark results accumulate without driving improvement.
Target systems: kube-bench 0.8+ (CIS Kubernetes Benchmark 1.8+); self-managed Kubernetes 1.28+; EKS, GKE, AKS (with managed control plane filtering); kubeadm-provisioned clusters; RKE2/k3s.
Threat Model
- Adversary 1 — Anonymous API server access: The API server has
--anonymous-auth=trueand the system:anonymous user has been bound to a permissive ClusterRole (a common misconfiguration). An attacker accesses the API server without credentials and enumerates or modifies cluster state. - Adversary 2 — Kubelet read-only port exploitation: The kubelet’s read-only port (10255) is open and unauthenticated. An attacker on the network queries
/podsand/metricsto enumerate all pods, environment variables, and resource usage on the node — sensitive data without authentication. - Adversary 3 — etcd unauthenticated access: etcd client URLs are bound to
0.0.0.0without peer or client authentication. An attacker with network access to port 2379 reads all cluster secrets in plaintext. - Adversary 4 — Service account token abuse:
--service-account-lookup=falsemeans a deleted service account’s token remains valid. An attacker who obtained a token before service account deletion continues to use it indefinitely. - Adversary 5 — Audit log blind spot: Audit logging is disabled. An attacker who compromises the cluster has no recorded evidence of their API calls, making forensic investigation impossible.
- Access level: Adversaries 1 and 2 need network access to Kubernetes ports. Adversaries 3 and 4 need network access or a compromised pod. Adversary 5 is a detection gap exploited post-compromise.
- Objective: Enumerate cluster, extract secrets, establish persistence, evade detection.
- Blast radius: etcd access or API server anonymous access gives complete cluster control — equivalent to root on every node.
Configuration
Step 1: Run kube-bench
kube-bench auto-detects the cluster component configuration:
# Run against a kubeadm-provisioned cluster (auto-detects version).
kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
kubectl logs job/kube-bench
# Or run directly on a control plane node.
kube-bench run --targets master,node,etcd,policies
# Run specific CIS benchmark version.
kube-bench run --benchmark cis-1.8
# Output formats.
kube-bench run --json > kube-bench-results.json
kube-bench run --junit > kube-bench-results.xml # For CI systems.
# For managed clusters (EKS): use EKS-specific targets.
kube-bench run --targets node,policies --benchmark eks-stig-kubernetes-v2r2
# Or use the managed benchmark (skips control plane — AWS manages it).
kube-bench run --benchmark eks-1.4.0
Interpreting results:
[INFO] 1 Master Node Security Configuration
[INFO] 1.2 API Server
[PASS] 1.2.1 Ensure that the --anonymous-auth argument is set to false
[FAIL] 1.2.6 Ensure that the --kubelet-certificate-authority argument is set as appropriate
[WARN] 1.2.11 Ensure that the admission control plugin AlwaysAdmit is not set
[INFO] 1.2.12 Ensure that the admission control plugin AlwaysPullImages is set
== Summary master ==
43 checks PASS
7 checks FAIL
10 checks WARN
0 checks INFO
PASS: Control is satisfied.FAIL: Control is not satisfied; remediation required.WARN: Manual verification required (kube-bench cannot automate this check).INFO: Informational; review but no action required.
Step 2: Prioritise and Remediate FAIL Findings
High-priority FAIL checks and their remediation:
# 1.2.1 — anonymous-auth: disable anonymous API server access.
# /etc/kubernetes/manifests/kube-apiserver.yaml
# Add flag:
# - --anonymous-auth=false
# Note: on EKS/GKE this is managed by the provider.
# 1.2.2 — token-auth-file: ensure static token file is not used.
# Remove --token-auth-file from kube-apiserver flags.
# 1.2.6 — kubelet-certificate-authority: verify kubelet TLS.
# kube-apiserver must verify kubelet certs:
# - --kubelet-certificate-authority=/etc/kubernetes/pki/ca.crt
# 1.2.22 — audit-log-path: enable audit logging.
# - --audit-log-path=/var/log/kubernetes/audit.log
# - --audit-log-maxage=30
# - --audit-log-maxbackup=10
# - --audit-log-maxsize=100
# - --audit-policy-file=/etc/kubernetes/audit-policy.yaml
# 4.2.1 — kubelet: disable anonymous authentication.
# /var/lib/kubelet/config.yaml
# authentication:
# anonymous:
# enabled: false # Was true by default on some distros.
# 4.2.2 — kubelet: require Webhook authorization.
# authorization:
# mode: Webhook # Not AlwaysAllow.
# 4.2.6 — protect kernel defaults.
# protectKernelDefaults: true
# 2.1 — etcd: peer and client TLS.
# /etc/kubernetes/manifests/etcd.yaml
# - --cert-file=/etc/kubernetes/pki/etcd/server.crt
# - --key-file=/etc/kubernetes/pki/etcd/server.key
# - --client-cert-auth=true
# - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
# - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
# - --peer-client-cert-auth=true
For kubeadm clusters, patch the configuration in place:
# Edit kube-apiserver manifest (kubeadm clusters).
# The API server restarts automatically when the manifest changes.
sudo cp /etc/kubernetes/manifests/kube-apiserver.yaml /etc/kubernetes/manifests/kube-apiserver.yaml.bak
sudo vi /etc/kubernetes/manifests/kube-apiserver.yaml
# Verify the API server came back after edit.
kubectl get nodes
kubectl cluster-info
Step 3: Configuration File for Managed Clusters
Create a kube-bench configuration that skips controls managed by the cloud provider:
# kube-bench-config.yaml — for EKS clusters.
# Skip control plane checks (AWS manages the control plane).
skip:
- "1" # Master node checks.
- "2" # etcd checks.
- "3" # Control plane configuration.
# Run only node and policy checks.
targets:
- node
- policies
kube-bench run --config kube-bench-config.yaml --benchmark eks-1.4.0
For GKE:
# GKE auto-configures most CIS controls. Run only node-level checks.
kube-bench run --targets node --benchmark gke-1.4.0
Step 4: CI/CD Integration
Run kube-bench automatically on cluster configuration changes:
# .github/workflows/cis-benchmark.yml
name: CIS Kubernetes Benchmark
on:
schedule:
- cron: "0 6 * * *" # Daily at 6am.
push:
paths:
- "kubernetes/**" # Also on k8s config changes.
jobs:
kube-bench:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Deploy kube-bench job
run: |
kubectl apply -f kubernetes/kube-bench-job.yaml
kubectl wait --for=condition=complete job/kube-bench --timeout=300s
- name: Collect results
run: |
kubectl logs job/kube-bench --all-containers > kube-bench-results.txt
# Extract FAIL count.
FAIL_COUNT=$(grep -c '^\[FAIL\]' kube-bench-results.txt || true)
echo "FAIL checks: $FAIL_COUNT"
echo "fail_count=$FAIL_COUNT" >> $GITHUB_ENV
- name: Fail CI on new failures
run: |
# Compare against baseline.
BASELINE=$(cat .kube-bench-baseline)
if [ "${{ env.fail_count }}" -gt "$BASELINE" ]; then
echo "REGRESSION: ${{ env.fail_count }} failures (baseline: $BASELINE)"
cat kube-bench-results.txt | grep '^\[FAIL\]'
exit 1
fi
echo "No new failures."
- name: Upload results
uses: actions/upload-artifact@v4
with:
name: kube-bench-results
path: kube-bench-results.txt
# kubernetes/kube-bench-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: kube-bench
namespace: kube-system
spec:
template:
spec:
hostPID: true
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
containers:
- name: kube-bench
image: aquasec/kube-bench:v0.8.0
command: ["kube-bench", "run", "--targets", "master,node,etcd,policies", "--json"]
volumeMounts:
- name: var-lib-etcd
mountPath: /var/lib/etcd
readOnly: true
- name: etc-kubernetes
mountPath: /etc/kubernetes
readOnly: true
- name: etc-systemd
mountPath: /etc/systemd
readOnly: true
- name: var-lib-kubelet
mountPath: /var/lib/kubelet
readOnly: true
- name: usr-local-mount-1
mountPath: /usr/local/mount-from-host/bin
readOnly: true
restartPolicy: Never
volumes:
- name: var-lib-etcd
hostPath:
path: /var/lib/etcd
- name: etc-kubernetes
hostPath:
path: /etc/kubernetes
- name: etc-systemd
hostPath:
path: /etc/systemd
- name: var-lib-kubelet
hostPath:
path: /var/lib/kubelet
- name: usr-local-mount-1
hostPath:
path: /usr/local/bin
Step 5: Tracking Remediation with Exceptions
Not all FAIL checks can be immediately remediated. Track exceptions explicitly:
# kube-bench-exceptions.yaml — documented accepted risks.
exceptions:
- check_id: "1.2.11"
check_description: "AlwaysPullImages admission plugin"
reason: "Multi-tenant clusters require this; enabling causes issues with air-gapped nodes"
owner: "platform-team"
review_date: "2026-11-01"
risk_accepted_by: "security-eng"
- check_id: "4.2.12"
check_description: "Ensure that the RotateKubeletServerCertificate argument is set to true"
reason: "Not yet supported by our managed node group upgrade process; tracked in INFRA-4821"
owner: "infrastructure-team"
review_date: "2026-08-01"
risk_accepted_by: "ciso"
# kube_bench/filter_results.py
def filter_accepted_exceptions(results: list, exceptions: list) -> list:
"""Remove accepted exceptions from FAIL list before CI gate evaluation."""
exception_ids = {e["check_id"] for e in exceptions}
return [r for r in results if r["id"] not in exception_ids or r["status"] != "FAIL"]
Step 6: Node-Level Hardening from Benchmark Findings
The most commonly failed node checks and their remediation:
# /var/lib/kubelet/config.yaml — kubelet hardening from CIS.
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
# 4.2.1: Disable anonymous authentication.
authentication:
anonymous:
enabled: false
webhook:
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
# 4.2.2: Webhook authorization.
authorization:
mode: Webhook
# 4.2.6: Protect kernel defaults.
protectKernelDefaults: true
# 4.2.7: Make kubelet config file read-only.
# (Set file permissions on /var/lib/kubelet/config.yaml to 644.)
# 4.2.10: TLS cipher restriction.
tlsCipherSuites:
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
tlsMinVersion: VersionTLS12
# 4.2.12: Rotate kubelet server certificates.
serverTLSBootstrap: true
rotateCertificates: true
# 4.2.4: Disable read-only port.
readOnlyPort: 0
Step 7: Scheduled Reporting
Generate trend reports to show compliance improvement over time:
#!/usr/bin/env python3
# kube_bench/report.py
import json
from datetime import datetime
from pathlib import Path
def parse_kube_bench_json(results_file: str) -> dict:
with open(results_file) as f:
data = json.load(f)
totals = {"pass": 0, "fail": 0, "warn": 0, "info": 0}
failures = []
for section in data.get("Controls", []):
for test in section.get("tests", []):
for result in test.get("results", []):
status = result["status"].lower()
totals[status] = totals.get(status, 0) + 1
if status == "fail":
failures.append({
"id": result["test_number"],
"description": result["test_desc"],
"remediation": result.get("remediation", "")
})
return {"totals": totals, "failures": failures, "date": datetime.utcnow().isoformat()}
def write_trend(history_file: str, run_result: dict):
history = json.loads(Path(history_file).read_text()) if Path(history_file).exists() else []
history.append(run_result)
Path(history_file).write_text(json.dumps(history, indent=2))
Step 8: Telemetry
kube_bench_fail_total{cluster, section} gauge
kube_bench_pass_total{cluster, section} gauge
kube_bench_warn_total{cluster} gauge
kube_bench_check_status{cluster, check_id} gauge (1=pass, 0=fail)
kube_bench_run_timestamp{cluster} gauge
Alert on:
kube_bench_fail_totalincreases from previous run — a benchmark regression was introduced; review recent configuration changes.kube_bench_check_status{check_id="1.2.1"}== 0 — anonymous auth is enabled on the API server; critical finding.kube_bench_check_status{check_id="2.1"}== 0 — etcd is unauthenticated; critical finding.- No
kube_bench_run_timestampupdate in 25 hours — scheduled benchmark job failed to run.
Expected Behaviour
| Signal | Unchecked cluster | kube-bench enforced |
|---|---|---|
| Anonymous API server access | Possible if unconfigured | Detected at next scan; CI gate fails on introduction |
| Kubelet read-only port open | Common default | Detected; readOnlyPort: 0 pushed via kubelet config |
| Audit logging disabled | Default on kubeadm | Detected; audit policy applied |
| Benchmark regressions from config changes | Silent until next audit | CI gate fails; PR blocked |
| Node-to-node benchmark drift | Undetected | Per-node job run detects node-specific failures |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
hostPID: true for kube-bench job |
Required to inspect process args | Privileged pod; only run in kube-system | Restrict job to control plane node; use RBAC to prevent non-admin scheduling |
| Daily scheduled scan | Catches drift | Another cron job to maintain | Use existing monitoring infrastructure; alert on job failure |
| Strict CI gate (fail on any new FAIL) | Prevents regression | May block legitimate changes | Exception mechanism with documented owner and review date |
| CIS benchmark is not threat-specific | Comprehensive baseline | Some checks are low-risk in your environment | Use exceptions for low-risk checks; focus remediation on critical findings |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| kube-bench version mismatch | Checks differ from expected CIS version | Version logged in job output | Pin --benchmark flag to specific version; update explicitly |
| Job fails to start on managed cluster | Kubelet read-only volume mounts fail | Job pod pending/crashlooping | Use cloud-provider specific benchmark (EKS/GKE flags); skip host mount checks |
| False positive after distro update | PASS check becomes FAIL after OS package update changes defaults | Benchmark regression alert | Investigate actual config; update baseline if the new state is correct |
| Exception list not updated at review date | Accepted risk still present past agreed remediation date | Automated review date check in CI | Alert when exception review_date < today; require renewal or remediation |