Kubernetes Backup Security with Velero: Encryption, RBAC, and Immutable Storage
Problem
Velero backs up Kubernetes resources and persistent volume data to object storage. A complete backup contains: all Kubernetes Secrets (database passwords, API keys, TLS private keys), all workload configurations, all PersistentVolumeClaims and their data, and — if using the default setup — the etcd encryption key metadata.
This makes Velero backups extraordinarily valuable to attackers and a critical ransomware target:
- Backup theft = full cluster credential harvest: An attacker who reads a Velero backup extracts all Secrets, including the TLS keys, database passwords, and OAuth tokens that live in Kubernetes Secrets.
- Backup deletion = recovery impossible: A ransomware operator who deletes backups prevents cluster recovery after a destructive attack.
- Backup overwrite = poisoned restore: An attacker who can write to the backup bucket replaces legitimate backups with malicious ones; a restore operation deploys the attacker’s workloads.
Specific gaps in default Velero deployments:
- Backup bucket has no object lock; backups can be deleted or overwritten.
- Backup data is unencrypted at the object level; S3 server-side encryption (SSE-S3) is managed by AWS, not by Velero; anyone with S3 access can read the backup content.
- Velero’s service account or IAM role has
s3:DeleteObjectpermission; ransomware targeting the cluster can use Velero to delete its own backups. - No alerting on backup failure or on unexpected backup access.
- Restores are not tested; a backup that cannot be restored is not a backup.
Target systems: Velero 1.13+; AWS S3 with Object Lock; GCS with bucket lock; Azure Blob with immutability policies; velero-plugin-for-aws 1.9+; Kopia for backup encryption (Velero’s built-in backup repository).
Threat Model
- Adversary 1 — Backup data theft: An attacker who compromises a developer’s AWS credentials (or a Kubernetes service account with S3 access) reads backup files and extracts all Kubernetes Secrets from the serialised backup archive.
- Adversary 2 — Ransomware via backup deletion: A ransomware operator compromises the cluster, destroys workloads and PVs, then deletes Velero backups. Recovery becomes impossible.
- Adversary 3 — Backup overwrite for persistence: An attacker replaces Velero backup files with malicious versions containing backdoored deployments. On the next restore, the malicious workloads are deployed.
- Adversary 4 — Backup exfiltration via Velero API: The Velero server has RBAC permissions to download backup data. An attacker who compromises the Velero pod uses it to download all backup archives.
- Adversary 5 — Cross-cluster replay attack: A backup from one cluster is restored to a different cluster without sanitisation. The restored Secrets contain credentials for the original cluster’s external services, which the new cluster’s workloads then use.
- Access level: Adversary 1 has S3/GCS read credentials. Adversary 2 has cluster-admin access or S3 write/delete access. Adversary 3 has S3 write access. Adversary 4 has Velero pod exec or k8s API access. Adversary 5 is an operator making a restore mistake.
- Objective: Extract credentials, prevent recovery, establish persistent access via restore.
- Blast radius: An unencrypted, unprotected backup is equivalent in impact to a full cluster compromise. Deleted backups leave a cluster unrecoverable after a destructive attack.
Configuration
Step 1: S3 Bucket with Object Lock (Immutable Backups)
Create the backup bucket with Object Lock before Velero is installed — Object Lock cannot be enabled on existing buckets:
# Create S3 bucket with Object Lock enabled.
aws s3api create-bucket \
--bucket velero-backups-prod \
--region us-east-1 \
--object-lock-enabled-for-bucket
# Set a default retention policy (COMPLIANCE mode: cannot be overridden even by root).
aws s3api put-object-lock-configuration \
--bucket velero-backups-prod \
--object-lock-configuration '{
"ObjectLockEnabled": "Enabled",
"Rule": {
"DefaultRetention": {
"Mode": "COMPLIANCE",
"Days": 30
}
}
}'
# Enable versioning (required for Object Lock).
aws s3api put-bucket-versioning \
--bucket velero-backups-prod \
--versioning-configuration Status=Enabled
# Block all public access.
aws s3api put-public-access-block \
--bucket velero-backups-prod \
--public-access-block-configuration \
'BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true'
With COMPLIANCE mode Object Lock, no AWS account (including root) can delete backup objects within the retention window. This is the strongest protection against ransomware backup deletion.
Step 2: Velero IAM Policy — Minimum Permissions, No Delete
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts",
"s3:GetBucketVersioning",
"s3:GetObjectVersion"
],
"Resource": "arn:aws:s3:::velero-backups-prod/*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:ListBucketMultipartUploads"
],
"Resource": "arn:aws:s3:::velero-backups-prod"
},
{
"Effect": "Allow",
"Action": [
"ec2:DescribeVolumes",
"ec2:DescribeSnapshots",
"ec2:CreateTags",
"ec2:CreateSnapshot",
"ec2:DeleteSnapshot",
"ec2:DescribeTags"
],
"Resource": "*"
}
]
}
Notably absent: s3:DeleteObject, s3:DeleteObjectVersion. Velero does not need to delete objects — Object Lock with lifecycle rules handles expiry. Without DeleteObject permission, neither Velero nor an attacker using the Velero IAM role can destroy backups.
# Create the IAM policy and role.
aws iam create-policy \
--policy-name velero-backup-policy \
--policy-document file://velero-iam-policy.json
# Create a service account with IRSA (IAM Roles for Service Accounts).
eksctl create iamserviceaccount \
--name velero \
--namespace velero \
--cluster prod-cluster \
--attach-policy-arn arn:aws:iam::<account>:policy/velero-backup-policy \
--approve
Step 3: Backup Encryption with Velero + Kopia
Velero 1.10+ uses Kopia as its backup repository, which supports encryption at the repository level. Data is encrypted before it leaves the cluster:
# Install Velero with Kopia backend (default since 1.10).
helm install velero vmware-tanzu/velero \
--namespace velero --create-namespace \
--set configuration.backupStorageLocation[0].name=aws \
--set configuration.backupStorageLocation[0].provider=aws \
--set configuration.backupStorageLocation[0].bucket=velero-backups-prod \
--set configuration.backupStorageLocation[0].config.region=us-east-1 \
--set configuration.backupStorageLocation[0].config.s3ForcePathStyle=false \
--set serviceAccount.server.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::<account>:role/velero-irsa \
--set "features=EnableCSI" \
--set "defaultRepoMaintainFrequency=168h" \
--set "uploaderType=kopia"
The Kopia repository is encrypted with a key derived from a repository password. Set the password as a Kubernetes Secret:
# Create the repository password secret.
kubectl create secret generic velero-repo-credentials \
--namespace velero \
--from-literal=repository-password="$(openssl rand -base64 32)"
With Kopia encryption:
- Data blocks are encrypted client-side before upload to S3.
- S3 SSE provides a second layer (managed by AWS).
- An attacker who accesses S3 directly gets encrypted ciphertext without the repository password.
Step 4: Kubernetes RBAC for Velero
Restrict who can create, read, and restore backups:
# ClusterRole for backup operators (create backups, view status).
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: velero-backup-operator
rules:
- apiGroups: [velero.io]
resources: [backups, schedules]
verbs: [get, list, create, watch]
- apiGroups: [velero.io]
resources: [restores]
verbs: [] # Cannot create restores — separate role.
---
# ClusterRole for restore operators (restricted to SRE on-call).
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: velero-restore-operator
rules:
- apiGroups: [velero.io]
resources: [backups]
verbs: [get, list]
- apiGroups: [velero.io]
resources: [restores]
verbs: [get, list, create, watch]
- apiGroups: [velero.io]
resources: [backups/download]
verbs: [] # Cannot download backup archives directly.
---
# ClusterRole for backup deletion (should have no members in production).
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: velero-backup-delete
rules:
- apiGroups: [velero.io]
resources: [deletebackuprequests]
verbs: [create]
# Bind restore role to SRE team (OIDC group).
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: velero-restore-sre
subjects:
- kind: Group
name: "oidc:sre-on-call"
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: velero-restore-operator
apiGroup: rbac.authorization.k8s.io
Step 5: Backup Schedule and Retention
# Create a scheduled backup covering all namespaces.
velero schedule create prod-daily \
--schedule="0 2 * * *" \
--ttl 720h \
--include-namespaces '*' \
--exclude-namespaces kube-system,velero \
--snapshot-volumes=true \
--volume-snapshot-locations aws \
--labels environment=production
# Create a weekly full backup with longer retention.
velero schedule create prod-weekly \
--schedule="0 0 * * 0" \
--ttl 2160h \
--include-namespaces '*' \
--snapshot-volumes=true
# Check schedule status.
velero schedule get
velero backup get --selector "velero.io/schedule-name=prod-daily" | head -5
The --ttl here controls when Velero marks a backup for deletion via DeleteBackupRequest. With Object Lock’s COMPLIANCE mode, Velero cannot actually delete the object even when TTL expires — the object lock holds it until the lock period expires. This is intentional: the Object Lock retention period provides a hard floor that cannot be reduced.
Step 6: Test Restores Regularly
A backup that has never been tested is not a backup.
# Restore a specific namespace to a test cluster (never restore to production without testing).
velero restore create test-restore-$(date +%Y%m%d) \
--from-backup prod-daily-20260428000000 \
--include-namespaces payments \
--namespace-mappings payments:payments-restore-test \
--restore-volumes=true
# Check restore status.
velero restore describe test-restore-20260428
velero restore logs test-restore-20260428
# Verify key resources were restored.
kubectl get pods,secrets,pvc -n payments-restore-test
# Validate a critical secret is present (not empty).
kubectl get secret db-credentials -n payments-restore-test -o jsonpath='{.data.password}' | base64 -d | wc -c
# Should return non-zero.
Schedule quarterly restore tests with a documented runbook. Store the test results with the backup metadata.
Step 7: Exclude Sensitive Namespaces Selectively
Some namespaces should not appear in backups due to sensitivity or restore complexity:
# Backup with exclusions for secrets that should be re-created, not restored.
velero backup create manual-backup \
--include-namespaces production \
--exclude-resources secrets \ # Exclude all Secrets; re-create from a secrets manager on restore.
--snapshot-volumes=true
# Or: use label selectors to exclude specific secrets.
# Label secrets that should not be backed up (e.g., bootstrap secrets rotated on restore).
kubectl label secret bootstrap-token -n kube-system velero.io/exclude-from-backup=true
Note the trade-off: excluding Secrets from backups means they must be re-created from an external secrets manager on restore. This is often safer than storing all Secrets in backup archives, especially for Secrets that are short-lived or bootstrapped from Vault.
Step 8: Telemetry
velero_backup_success_total{schedule} counter
velero_backup_failure_total{schedule} counter
velero_backup_last_successful_timestamp{schedule} gauge
velero_backup_size_bytes{backup_name} gauge
velero_restore_success_total counter
velero_restore_failure_total counter
s3_backup_object_count{bucket} gauge
s3_unexpected_delete_attempt_total{bucket} counter
Alert on:
velero_backup_failure_totalnon-zero — backups are failing; data loss risk accumulates.velero_backup_last_successful_timestamp> 26h ago — daily backup missed; investigate.s3_unexpected_delete_attempt_totalnon-zero — someone or something attempted to delete backup objects; this is always anomalous given the IAM policy excludess3:DeleteObject.- Unexpected IAM role assumption against the Velero role — CloudTrail alert on out-of-hours or off-cluster usage.
Expected Behaviour
| Signal | Default Velero | Hardened Velero |
|---|---|---|
| Backup data accessed by attacker with S3 creds | Plaintext k8s resources and Secrets | Encrypted ciphertext; requires Kopia password |
| Attacker deletes backup objects | Succeeds (no protection) | Blocked by Object Lock (COMPLIANCE); S3 returns error |
| Attacker overwrites backup with malicious content | Succeeds | Object Lock blocks overwrite of existing versions |
| Velero pod compromised; backup download | Archives downloadable via Velero API | No download RBAC permission for standard roles |
| Restore to wrong cluster | Not prevented | Documented procedure requires namespace mapping and secret re-creation |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Object Lock COMPLIANCE mode | Truly immutable; cannot be overridden | Backups cannot be deleted even if you want to | Set appropriate TTL; lifecycle rules expire after lock period. |
| Kopia encryption | Client-side encryption; S3 access = ciphertext | Repository password is a new secret to manage | Store in HSM or Vault; rotate with defined procedure. |
No s3:DeleteObject for Velero |
Prevents Velero from being used to delete its own backups | Velero cannot expire old backups via S3 API | Object Lock lifecycle handles expiry; acceptable trade-off. |
| Excluding Secrets from backups | Reduces sensitivity of backup archives | Restore requires re-creating Secrets from external source | Only viable with a mature secrets manager (Vault, AWS Secrets Manager). |
| Separate restore-test cluster | Safe testing without touching production | Operational overhead of maintaining a test cluster | Use a low-cost k3s or k8s in kind for restore validation; automated quarterly. |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Kopia repository password lost | Cannot decrypt any backup archives | Restores fail with decryption error | Emergency: recover password from Vault; without it, backups are irrecoverable — store password securely. |
| Object Lock TTL shorter than incident discovery time | Backup expired before ransomware discovered | Recovery window missed | Set Object Lock retention >= 30 days; review TTL against incident mean-time-to-discovery. |
| Backup schedule silently skipped | Hours of data lost before discovery | velero_backup_last_successful_timestamp alert |
Check Velero pod logs; Velero schedule controller restarts on pod restart. |
| PVC snapshot quota exceeded | Snapshot creation fails; backup incomplete | Backup status shows PartiallyFailed |
Increase EBS/GCS snapshot quota or clean up old snapshots. |
| Restore fails due to missing CRDs | Resources cannot be created on target cluster | Restore logs show no kind Foo |
Install required CRDs before restoring; order-dependent restores need manual intervention. |
| Backup size grows unbounded | Storage costs; Object Lock prevents deletion | S3 cost alert; bucket size metric | Adjust TTL; use lifecycle rules to transition old backups to cheaper storage (Glacier) before expiry. |