Kubernetes Dynamic Resource Allocation (DRA) Security Hardening
Problem
Dynamic Resource Allocation (DRA) graduated to GA in Kubernetes 1.32 and is now the recommended mechanism for scheduling specialised hardware: GPUs, TPUs, FPGAs, NICs with SR-IOV, and similar. It replaces the older device-plugin API for advanced workloads with a structured-parameters model where workloads request hardware via ResourceClaim objects that the scheduler matches against ResourceSlice-published inventory exposed by per-node DRA drivers.
The security story is materially different from device plugins. Device plugins were opaque kubelet sidecars with a narrow gRPC contract; DRA drivers are full Kubernetes citizens that:
- Run a controller component with cluster-wide read/write on
ResourceClaim,ResourceSlice, andDeviceClassobjects. - Run a kubelet plugin component with host access (typically
hostPathto/var/lib/kubelet/plugins_registry/, oftenprivileged: true, almost alwayshostPID). - Mediate access to hardware that, in the GPU/TPU case, can read VRAM left over from a prior tenant unless the driver is careful.
Three structural risks follow. First, RBAC for DRA objects is new — most platform teams have not yet authored Roles for resourceclaims or resourceclaimtemplates, so cluster-admin-bound service accounts are the default. Second, DRA drivers from third-party vendors (NVIDIA, Intel, AMD, plus an emerging set of cloud-specific ones) ship as Helm charts with clusterAdmin-level defaults; few teams audit these at install time. Third, the parameters field on a ResourceClaim is driver-defined opaque JSON, opening a parser-attack surface that does not exist in the simpler device-plugin model.
Workloads have been observed exploiting privileged DRA drivers to escape pods, read GPU memory from co-tenant inference jobs, and abuse claim-template controllers to mint resources that bypass ResourceQuota. The DRA API surface also changes the Pod-spec admission story: a Pod referencing a ResourceClaim carries an indirect attack vector that ValidatingAdmissionPolicies written before 1.32 do not inspect.
Target systems: Kubernetes 1.32+ (DRA GA), 1.33 (DRA AdminAccess and prioritised allocation), with NVIDIA GPU Operator ≥ 24.9, Intel Device Plugins ≥ 0.31, or equivalent third-party DRA drivers.
Threat Model
- Co-tenant pod attempting GPU memory disclosure. Goal: read VRAM left by the prior tenant’s inference run. Surface: DRA driver’s reset/zeroing logic; misconfigured
DeviceClass.config. - Tenant exhausting cluster GPUs via forged ResourceClaims. Goal: deny service to other tenants. Surface: missing
ResourceQuotarules oncount/resourceclaims.resource.k8s.ioand weak admission policy. - Compromised DRA driver controller. Goal: read all
ResourceClaimobjects (containing tenant identifiers, model paths, sometimes secrets) and pivot to other namespaces. Surface: cluster-wide RBAC granted at install. - Pod escape via privileged kubelet-plugin sidecar. Goal: hostPath-mount a UNIX socket, talk to the driver, request privileged operations. Surface: containers that share
/var/lib/kubelet/plugins/<driver>/with the kubelet plugin.
Blast radius without hardening: a single compromised tenant pod can exfiltrate GPU memory across the fleet. With hardening (driver scoping, claim-template admission, mandatory zeroing) the same compromise is contained to the tenant’s own claims, with audit evidence.
Configuration / Implementation
Step 1 — Enable DRA-aware admission
# apiserver-config.yaml fragment
apiServer:
featureGates:
DynamicResourceAllocation: true
DRAResourceClaimDeviceStatus: true
DRAAdminAccess: true # 1.33+; gates the high-privilege `adminAccess` field
admissionControl:
- ValidatingAdmissionPolicy
- ResourceQuota
Confirm:
kubectl api-resources --api-group=resource.k8s.io
# resourceclaims resource.k8s.io/v1 true ResourceClaim
# resourceclaimtemplates resource.k8s.io/v1 true ResourceClaimTemplate
# resourceslices resource.k8s.io/v1 false ResourceSlice
# deviceclasses resource.k8s.io/v1 false DeviceClass
Step 2 — Author RBAC for DRA
Default cluster roles do not grant tenants permission to create ResourceClaims. Define a tenant-scoped role:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: dra-tenant
namespace: tenant-a
rules:
- apiGroups: ["resource.k8s.io"]
resources: ["resourceclaims", "resourceclaimtemplates"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["resource.k8s.io"]
resources: ["deviceclasses"]
verbs: ["get", "list"] # read-only
Cluster-scoped DeviceClass and ResourceSlice must be read-only for tenants — these describe hardware inventory and a write would let a tenant fake capabilities.
Step 3 — Lock down the DRA driver install
DRA drivers ship as Helm charts that frequently include ClusterRoleBinding to cluster-admin. Replace with a least-privilege role:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: nvidia-dra-driver-controller
rules:
- apiGroups: ["resource.k8s.io"]
resources: ["resourceclaims", "resourceslices"]
verbs: ["get", "list", "watch", "update", "patch"]
- apiGroups: ["resource.k8s.io"]
resources: ["resourceclaims/status", "resourceslices/status"]
verbs: ["update", "patch"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "patch"]
# Explicitly NOT: secrets, configmaps cluster-wide, pods/exec, namespaces.
The kubelet plugin component must not run as privileged: true. Use an explicit capability set:
securityContext:
privileged: false
capabilities:
drop: ["ALL"]
add: ["SYS_ADMIN"] # only if the driver does mount(2); justify
readOnlyRootFilesystem: true
seccompProfile:
type: RuntimeDefault
Step 4 — Force device zeroing in DeviceClass
For GPU/TPU classes, the DeviceClass.spec.config controls reset behaviour. Make zeroing mandatory:
apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
name: nvidia-gpu-shared
spec:
selectors:
- cel:
expression: "device.driver == 'gpu.nvidia.com'"
config:
- opaque:
driver: gpu.nvidia.com
parameters:
resetPolicy: "ZeroVRAMOnRelease"
sharingStrategy: "TimeSlicing"
sharingTimeSliceMs: 100
Validate at admission time that no namespace can create a ResourceClaim referencing a class without zeroing — the driver-specific field name varies, so pin via VAP rather than relying on driver defaults.
Step 5 — ValidatingAdmissionPolicy for ResourceClaim
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: dra-claim-policy
spec:
failurePolicy: Fail
matchConstraints:
resourceRules:
- apiGroups: ["resource.k8s.io"]
apiVersions: ["v1"]
operations: ["CREATE", "UPDATE"]
resources: ["resourceclaims", "resourceclaimtemplates"]
validations:
- expression: |
object.spec.devices.requests.all(r,
r.deviceClassName in ['nvidia-gpu-shared', 'nvidia-gpu-exclusive', 'tpu-v5e']
)
message: "ResourceClaim must reference an approved DeviceClass."
- expression: |
!has(object.spec.devices.requests[0].adminAccess) ||
object.spec.devices.requests[0].adminAccess == false
message: "adminAccess is reserved for cluster admins; use a ResourceClaimTemplate from the platform team."
- expression: "size(object.spec.devices.requests) <= 8"
message: "ResourceClaim cannot request more than 8 devices; use multiple claims for larger jobs."
Step 6 — Quota the new resources
DRA introduces quotable counts:
apiVersion: v1
kind: ResourceQuota
metadata:
name: dra-quota
namespace: tenant-a
spec:
hard:
count/resourceclaims.resource.k8s.io: "20"
count/resourceclaimtemplates.resource.k8s.io: "5"
Without these, a malicious or buggy operator can mint thousands of claims and pin scheduler memory.
Step 7 — Audit policy
# audit-policy.yaml
- level: Metadata
resources:
- group: "resource.k8s.io"
resources: ["resourceclaims", "resourceslices", "deviceclasses"]
- level: RequestResponse
resources:
- group: "resource.k8s.io"
resources: ["resourceclaims"]
verbs: ["create", "delete", "patch"]
namespaces: ["tenant-*"]
Stream to your SIEM; alert on any adminAccess: true create, any ResourceClaim whose parameters blob exceeds 16KB (likely a parser-attack probe), and any DeviceClass mutation.
Expected Behaviour
| Signal | Before hardening | After hardening |
|---|---|---|
Tenant creates adminAccess: true claim |
Allowed; admin-mode access to device | VAP rejects with explanatory message |
| GPU memory remnant after pod release | VRAM may persist | Zeroed by driver before next allocation |
| Cluster-admin scope of DRA driver | Read everything | Limited to resource.k8s.io group |
| Audit trail of claim mutations | Mixed in with general API logs | Separate stream with RequestResponse body |
ResourceQuota on claims |
Not enforced | 20-claim limit per tenant |
# Verify zeroing is in DeviceClass.
kubectl get deviceclass nvidia-gpu-shared -o jsonpath='{.spec.config[0].opaque.parameters.resetPolicy}'
# ZeroVRAMOnRelease
# Verify VAP is binding.
kubectl get validatingadmissionpolicybinding | grep dra
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Zeroing on release | Closes cross-tenant memory leak | 1–4s per device release; hurts rapid-cycle batch jobs | Use exclusive (non-shared) claims for trusted single-tenant pipelines |
| Restricted driver RBAC | Smaller blast radius | Vendor charts may break minor upgrades | Pin chart versions; track upstream RBAC diffs in CI |
| VAP enforcement | Catches misconfigured workloads at submit time | CEL expressions add submit-time latency | Cache VAP compilation (default in 1.32+); keep expressions <50 ops |
| Quota on claims | Prevents flood DoS | Legitimate large-batch jobs need exception | Per-tenant override namespace; review quarterly |
Banning privileged: true for kubelet plugin |
Removes host-takeover path | Some drivers (e.g., older NVIDIA builds) refuse to start | Require vendor SBOM + capability justification before approval |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Driver controller crashloops post-RBAC trim | Pods stuck Pending with WaitingForResourceAllocation |
kubectl describe pod + driver logs forbidden: cannot list secrets |
Re-add specific verb identified in error; never restore wildcard |
| VAP regex over-matches and blocks ops claims | Platform jobs cannot submit DRA claims | VAP audit log shows reject of platform namespace | Add namespaceSelector to exclude kube-system and platform-* |
| Zeroing parameter ignored by driver | Inter-tenant memory disclosure still possible | Periodic VRAM-canary test pod | File issue with vendor; in interim, require exclusive claims for sensitive workloads |
| ResourceQuota race with templates | Templated workload failures during burst | Events: exceeded quota: dra-quota |
Increase template-derived claim count carefully; consider HPA back-off |
| Audit log volume spikes | Backpressure on audit pipeline | Webhook receiver latency >5s | Drop Metadata level on resourceslices (high-frequency); keep RequestResponse on claims |
When to Consider a Managed Alternative
- GKE Autopilot, EKS Auto Mode, and AKS managed GPU pools centrally apply DRA hardening and zeroing defaults; sensible if you do not have a platform team to author and maintain VAP+RBAC.
- For sovereign / regulated workloads with strict tenant isolation, confidential GPU offerings (NVIDIA H100 in CC-mode on Azure, AWS) eliminate cross-tenant VRAM leakage at the hardware level.