Kubernetes In-Place Pod Resize Security: Admission Policy and Resource-Cap Enforcement on 1.33+

Problem

Until 1.33, changing a Pod’s CPU or memory required deleting and recreating the Pod. The Vertical Pod Autoscaler (VPA) papered over this with an Auto mode that evicted and rescheduled, which was disruptive enough that most production clusters ran VPA in Off or Initial mode and accepted that misprovisioned workloads stayed misprovisioned until the next deploy. The InPlacePodVerticalScaling feature gate, alpha since 1.27, went GA in Kubernetes 1.33 and is on by default in 1.34. Pods can now have their resources.requests and resources.limits mutated on a running container, and the kubelet reconciles cgroup values without restarting the container.

This is a substantial improvement for resource utilisation. It is also a non-trivial change to the Kubernetes security model that most platform teams have not yet absorbed. Three concrete problems:

First, the path that mutates resources is the new pods/resize subresource, not the standard pods update path. Validating admission webhooks (Kyverno, Gatekeeper, Mutating/Validating Admission Policy) that hook pods see the original Pod spec at create time but never see the resize call. A policy that says “no container may request more than 4 CPU” enforced at create time is not enforced at resize time unless the policy explicitly registers for pods/resize. Several teams have already discovered this the slow way after a resize bumped a tenant from 1 vCPU to 16.

Second, ResourceQuotas behave differently for resize. The quota controller does observe resize events and rejects resizes that would push a namespace over quota, but it does so asynchronously — a resize that the quota controller has not yet processed can be applied to the kubelet, briefly putting the namespace over quota until the controller catches up. For pay-per-resource environments and noisy-neighbour scenarios this matters.

Third, the resize policy field on each container determines whether a CPU or memory change requires a container restart. Workloads that are sensitive to JVM heap re-sizing or to MADV_DONTDUMP mappings being torn down need to opt in to RestartContainer for memory, but the default of NotRequired is what most charts ship with. A live memory-limit reduction below the working set can OOM-kill workloads at a time the operator did not initiate.

Target systems: Kubernetes 1.33 (GA) or 1.34+ (default-on); container runtimes containerd 2.0+ or CRI-O 1.33+; kernel cgroup v2 (mandatory for memory hot-resize).

Threat Model

Tenant in a multi-tenant cluster with update on pods (legitimate, for label/annotation changes) and update on pods/resize (often granted accidentally because RBAC defaults bundle subresources). Goal: bump a Pod past a per-tenant CPU/memory cap to extract more compute than billed for, or to satisfy a noisy workload at the expense of neighbours.
Compromised CI service account that previously had patch pods for kubectl rollouts. Now also has resize, which means the same compromise that previously let the attacker change image tags can now reshape resource requests across the namespace.
VPA recommender misconfiguration where a tenant-controlled metric drives recommended size. Goal: poison the metric so VPA resizes the workload to a value that consumes most of the node, evicting neighbours via scheduler pressure.
Insider operator using resize to mask cryptomining: bump CPU during off-hours, return it before the morning report. The Pod object’s metadata looks unchanged across the day; only the Status.Resources field and the resize event log show what happened.

Without resize-aware policy, all four scenarios bypass controls that operators believe are in place. With the configuration in this article, adversary 1 is constrained by a CEL ValidatingAdmissionPolicy on pods/resize, 2 is rate-limited and audited, 3 is bounded by a hard ResourceQuota the VPA cannot exceed, and 4 leaves an explicit audit trail.

Configuration / Implementation

Step 1 — Confirm the feature is on and the API is what you think

kubectl version --short
# Server Version: v1.33.x or v1.34.x

# Feature gate (1.33 GA; only present as a flag on older clusters):
kubectl get --raw /metrics 2>/dev/null \
  | grep kubernetes_feature_enabled \
  | grep InPlacePodVerticalScaling

# Confirm the subresource exists:
kubectl get --raw / | jq -r '.paths[]' | grep '/pods/resize' || true
kubectl explain pod.spec.containers.resizePolicy 2>&1 | head -20

Step 2 — Set explicit `resizePolicy` on every workload

The resizePolicy field is per-container and per-resource. Always set it; relying on the default is the most common failure mode.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  template:
    spec:
      containers:
        - name: api
          image: registry.example.com/api:1.42.0
          resources:
            requests: { cpu: "500m", memory: "512Mi" }
            limits:   { cpu: "2",    memory: "2Gi"  }
          resizePolicy:
            - resourceName: cpu
              restartPolicy: NotRequired
            - resourceName: memory
              restartPolicy: RestartContainer

Memory should generally be RestartContainer for stateful or JVM workloads (the runtime cannot reliably shrink heap in place) and NotRequired only for stateless services with confirmed elastic memory behaviour.

Step 3 — A CEL ValidatingAdmissionPolicy on `pods/resize`

This is the single most important control. The policy hooks the resize subresource directly and bounds what resize values are allowed regardless of who requests them.

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
  name: pod-resize-bounds
spec:
  failurePolicy: Fail
  matchConstraints:
    resourceRules:
      - apiGroups:   [""]
        apiVersions: ["v1"]
        operations:  ["UPDATE"]
        resources:   ["pods/resize"]
  validations:
    - expression: |
        object.spec.containers.all(c,
          (!has(c.resources.limits.cpu) ||
           quantity(c.resources.limits.cpu).isLessThan(quantity('8'))) &&
          (!has(c.resources.limits.memory) ||
           quantity(c.resources.limits.memory).isLessThan(quantity('16Gi')))
        )
      message: "Pod resize would exceed per-container ceiling (8 CPU / 16Gi)."
    - expression: |
        object.spec.containers.all(c,
          c.resources.requests.cpu == c.resources.limits.cpu ||
          quantity(c.resources.limits.cpu).isLessThan(quantity(c.resources.requests.cpu).asInteger() * 4))
      message: "Resize would create CPU limit/request ratio > 4×."
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
  name: pod-resize-bounds-binding
spec:
  policyName: pod-resize-bounds
  validationActions: [Deny, Audit]
  matchResources:
    namespaceSelector:
      matchExpressions:
        - { key: tier, operator: In, values: [tenant, dev] }

Two things to notice. (a) resources: ["pods/resize"] is not the same string as pods — a policy that omits the subresource hooks creation but not resize. (b) validationActions: [Deny, Audit] ensures the API server emits an audit annotation even when the request is allowed under another rule, useful for retroactive review.

Step 4 — Tighten the equivalent Kyverno or Gatekeeper policy

If you run Kyverno, the matching ClusterPolicy shape is:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: bound-pod-resize
spec:
  validationFailureAction: Enforce
  rules:
    - name: cap-cpu-on-resize
      match:
        any:
          - resources:
              kinds: ["Pod/resize"]
      validate:
        message: "Resize exceeds 8 CPU ceiling"
        pattern:
          spec:
            containers:
              - resources:
                  limits:
                    cpu: "<=8"

The kinds: ["Pod/resize"] form is what targets the subresource. A policy targeting kinds: ["Pod"] alone does not gate resize. Audit your existing Kyverno bundle for any policy that should also apply to resize and add a parallel Pod/resize rule.

Step 5 — RBAC: separate `pods/resize` from `pods`

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: tenant-a
  name: pod-resize-operator
rules:
  - apiGroups: [""]
    resources: ["pods/resize"]
    verbs:     ["update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: pod-edit-no-resize
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get","list","watch","update","patch"]
  # Notice: no `pods/resize` here.

Many cluster admins assume pods/resize is implicitly bundled into verbs: [update] on pods. It is not: subresources require explicit grants. This is good news because it means upgrading to 1.33 does not retroactively grant resize to existing edit roles. Audit your roles to confirm none of them broadly enumerate pods/*.

kubectl get clusterroles -o json \
  | jq -r '.items[] | select(.rules[]?.resources[]? | test("^pods(/.*)?$"))
           | .metadata.name + " " + (.rules | tostring)' \
  | grep 'pods/\*\|"pods/resize"'

Step 6 — Constrain the VPA’s resize recommendations

If you run the VPA, switch to the InPlaceOrRecreate update mode (added in VPA 1.3, paired with k8s 1.33) and bound recommendations with a LimitRange that the VPA must respect:

apiVersion: v1
kind: LimitRange
metadata:
  name: vpa-bounds
  namespace: tenant-a
spec:
  limits:
    - type: Container
      max:     { cpu: "4",    memory: "8Gi" }
      min:     { cpu: "100m", memory: "128Mi" }
      maxLimitRequestRatio: { cpu: "4", memory: "2" }

The VPA refuses to recommend outside LimitRange bounds; the admission policy refuses to apply outside its bounds even if a non-VPA actor tries. The two together give belt and braces.

Step 7 — Audit-log every resize

Add to the audit policy:

- level: RequestResponse
  resources:
    - group: ""
      resources: ["pods/resize"]
  verbs: ["update","patch"]
  omitStages: ["RequestReceived"]

RequestResponse (not Metadata) ensures the before/after resource values are captured, which is the only way to reconstruct who changed what to what after the fact.

Step 8 — Detect deferred resizes

When a resize cannot be honoured immediately (e.g., the node has no headroom), the kubelet reports status.resize: "Deferred" or "Infeasible". Workloads stuck in Deferred are a common symptom of a resize attempt that should not have been allowed in the first place.

kubectl get pods -A -o json \
  | jq -r '.items[] | select(.status.resize == "Deferred" or .status.resize == "Infeasible")
           | "\(.metadata.namespace)/\(.metadata.name)\t\(.status.resize)"'

Wire this into Prometheus via a kube-state-metrics 2.14+ recording rule:

sum by (namespace,pod) (kube_pod_status_resize{condition!="Proposed"})

Expected Behaviour

Signal	Before this hardening	After
Resize via `pods` update	Allowed, bypasses policies that hook only `pods`	Returns `404` — must use `pods/resize`
Resize beyond ceiling	Accepted, kubelet reconciles	Rejected by ValidatingAdmissionPolicy
Tenant resize attempt without subresource RBAC	Allowed if they have `update pods`	`403 forbidden`
Audit log of resize events	None or minimal	Full `RequestResponse` capture
ResourceQuota-overshoot window	Indeterminate	Bounded by quota controller + admission policy
VPA recommendation > LimitRange max	Applied	Capped at LimitRange max
Pod stuck `Deferred`/`Infeasible`	Not surfaced	Alert via kube-state-metrics

Verification snippet:

# Try to resize past the ceiling — should fail.
kubectl patch pod api-7b9f --subresource=resize \
  --patch '{"spec":{"containers":[{"name":"api","resources":{"limits":{"cpu":"16"}}}]}}'
# Expected: error from server: admission webhook ... denied the request: Pod resize would exceed ...

Trade-offs

Aspect	Benefit	Cost	Mitigation
Subresource-aware policies	Closes the resize bypass	Every existing policy must be reviewed for matching `pods/resize` rule	Build a CI lint on policy bundles that flags `pods` rules without `pods/resize` companions
`RestartContainer` memory policy	Avoids OOM on shrink	Resize causes a brief restart	Use only for memory; use `NotRequired` for CPU
`InPlaceOrRecreate` VPA mode	Less disruption than `Auto`	New code path — fewer war stories	Roll out per namespace; keep `Initial` mode as fallback
Audit-log RequestResponse	Reconstructable history	Audit log volume increases	Filter to resize subresource only; ship to cold storage
Hard ResourceQuota	Prevents tenant escape	Workloads with bursty needs are blocked	Pair with priority classes and a small reserve quota

Failure Modes

Failure	Symptom	Detection	Recovery
Policy hooks `pods`, not `pods/resize`	Resize bypasses validation	Audit log shows resize past stated ceiling	Add matching policy rule for `pods/resize`; replay audit
`resizePolicy` defaults to `NotRequired` for memory on a JVM workload	Live memory-limit shrink OOM-kills container	OOMKill events with no deploy correlation	Set `RestartContainer` for memory on stateful workloads
RBAC grants `pods/*` to tenant	Tenant can resize	`kubectl auth can-i update pods/resize -n tenant-a --as=...` returns yes	Replace with `pods` + explicit subresource grants
VPA recommends past LimitRange	LimitRange holds; recommendation is dropped	VPA event “recommendation dropped”	Tune VPA targetCPU/memory; raise LimitRange if intentional
Resize stuck `Infeasible`	Pod runs at old size; user thinks they got more	`status.resize` field; alert on metric	Either descheduler reschedules or user reverts
Quota controller lag	Brief over-quota	quota-overshoot Prometheus rule	Tighten admission policy below quota; treat quota as a backstop
Resize during rolling upgrade of kubelet	Kubelet ignores until new version up	Resize events queue	Drain node before upgrade; rerun resize after

When to Consider a Managed Alternative

GKE Autopilot abstracts node-level resize entirely; you specify Pod requests and Google manages the scaling. If your workload pattern fits Autopilot’s restrictions, it sidesteps most of this article.
AWS Karpenter can be paired with VPA + in-place resize but currently does not hook the resize subresource for its consolidation logic; rolling-restart-style consolidation is more predictable than mixing.
Azure AKS Vertical Pod Autoscaling (managed) preview in 1.33 clusters bundles many of these guardrails out of the box.