Azure Workload Identity for AKS: Federated Credential Access to Azure Resources

Azure Workload Identity for AKS: Federated Credential Access to Azure Resources

Problem

Pods running on AKS need access to Azure resources: Key Vault secrets, Storage accounts, Service Bus, SQL databases. The historical approaches all have significant security problems.

Storing Azure credentials as Kubernetes Secrets means those credentials exist at rest in etcd, can be exfiltrated via misconfigured RBAC, and require rotation procedures that teams consistently skip. Connection strings in environment variables show up in kubectl describe pod output, log aggregation pipelines, and crash dumps.

The evolution of AKS identity options has been rocky:

Service principal credentials (2018–2020): Operators manually created an Azure AD service principal, stored the client secret as a Kubernetes Secret, mounted it into pods, and rotated it every 90 days. Each rotation required a coordinated rollout across all pods consuming that secret. When credentials leaked, the blast radius was every workload sharing the secret.

AAD Pod Identity (2019–2023): Pod Identity introduced a node-level MIC (Managed Identity Controller) and NMI (Node Managed Identity) DaemonSet that intercepted IMDS requests from pods and returned tokens for a specific managed identity. It worked but introduced significant complexity: the NMI DaemonSet ran privileged with hostNetwork: true, the binding between pods and managed identities used custom resources that were easy to misconfigure, and pod label injection errors led to pods silently falling back to the node’s managed identity. Microsoft deprecated Pod Identity in 2022.

Node-level Managed Identity (still common): Assigning a user-assigned managed identity to the node pool gives all pods on that node access to whatever the managed identity can reach. This violates least privilege: every pod, from a public-facing web server to an internal batch job, inherits the same Azure permissions. A container escape from any pod on the node can access Key Vault secrets scoped to the node identity.

Azure Workload Identity (current): Uses OIDC federation between the AKS OIDC issuer and Azure AD (now Entra ID). The kubelet projects a short-lived, audience-bound service account token into the pod. The application exchanges this token at the Azure AD token endpoint for a scoped Azure access token. No stored credentials, no long-lived secrets, no shared identities across workloads.

This is the same pattern used by AWS IRSA and GCP Workload Identity Federation, applied to the Azure/AKS stack.

How the Token Exchange Works

The trust chain has three components:

  1. AKS OIDC issuer: The cluster exposes an OIDC discovery endpoint at https://<region>.oic.prod-aks.azure.com/<tenantId>/<clusterUUID>/. This endpoint serves a JWKS containing the public keys used to sign projected service account tokens.

  2. Azure AD federated credential: An app registration in Entra ID has a federated credential that specifies the issuer URL, subject (the Kubernetes service account identity string), and audience. Azure AD uses the OIDC discovery endpoint to fetch the JWKS and verify incoming tokens.

  3. Workload Identity webhook: A mutating admission webhook (installed via Helm) intercepts pod creation, detects the azure.workload.identity/use: "true" label, and injects a projected token volume and four environment variables (AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_FEDERATED_TOKEN_FILE, AZURE_AUTHORITY_HOST).

When the application calls Azure SDK, DefaultAzureCredential checks for these environment variables, reads the projected token from the file path in AZURE_FEDERATED_TOKEN_FILE, and POSTs it to the Azure AD token endpoint. Azure AD verifies the JWT signature using the OIDC JWKS, checks the subject claim matches the registered federated credential, and returns a scoped access token. The projected token is valid for one hour by default and is rotated automatically by the kubelet.

Enabling the OIDC Issuer on AKS

The OIDC issuer must be explicitly enabled on new or existing clusters:

az aks update \
  --resource-group rg-production \
  --name aks-production \
  --enable-oidc-issuer \
  --enable-workload-identity

For new clusters:

az aks create \
  --resource-group rg-production \
  --name aks-production \
  --enable-oidc-issuer \
  --enable-workload-identity \
  --node-count 3 \
  --node-vm-size Standard_D4s_v3

--enable-workload-identity installs the mutating admission webhook. Without it, the OIDC issuer is available but pods are not automatically configured to use it.

Retrieve the issuer URL — you need this exact value when creating the federated credential:

AKS_OIDC_ISSUER=$(az aks show \
  --resource-group rg-production \
  --name aks-production \
  --query "oidcIssuerProfile.issuerUrl" \
  --output tsv)

echo "$AKS_OIDC_ISSUER"

The URL takes the form https://eastus.oic.prod-aks.azure.com/<tenantId>/<clusterUUID>/. The trailing slash is significant — omitting it will cause federated credential validation to fail.

Verify the OIDC discovery endpoint is reachable and returns a valid JWKS:

curl -s "${AKS_OIDC_ISSUER}.well-known/openid-configuration" | jq .
curl -s "${AKS_OIDC_ISSUER}openid/v1/jwks" | jq .keys[0].kid

Creating the Entra ID App Registration and Federated Credential

Create an app registration (this is the Azure AD identity your workload will assume):

APP_NAME="aks-keyvault-reader"

APP_ID=$(az ad app create \
  --display-name "$APP_NAME" \
  --query appId \
  --output tsv)

az ad sp create --id "$APP_ID"

echo "Application (client) ID: $APP_ID"

For most AKS workloads, a user-assigned managed identity is simpler than an app registration and avoids managing client secrets for other auth flows:

IDENTITY_NAME="aks-keyvault-reader"
RESOURCE_GROUP="rg-production"
LOCATION="eastus"

az identity create \
  --name "$IDENTITY_NAME" \
  --resource-group "$RESOURCE_GROUP" \
  --location "$LOCATION"

USER_ASSIGNED_CLIENT_ID=$(az identity show \
  --name "$IDENTITY_NAME" \
  --resource-group "$RESOURCE_GROUP" \
  --query clientId \
  --output tsv)

echo "Client ID: $USER_ASSIGNED_CLIENT_ID"

Add the federated credential. The subject string must exactly match the Kubernetes service account identity in the form system:serviceaccount:<namespace>:<serviceaccount-name>:

az identity federated-credential create \
  --name "aks-production-keyvault-reader" \
  --identity-name "$IDENTITY_NAME" \
  --resource-group "$RESOURCE_GROUP" \
  --issuer "$AKS_OIDC_ISSUER" \
  --subject "system:serviceaccount:production:keyvault-reader" \
  --audiences "api://AzureADTokenExchange"

The audience api://AzureADTokenExchange is the fixed value Azure AD expects for workload identity federation. The webhook injects this as the projected token’s audience.

Grant the managed identity access to the target resource. For Key Vault:

KEY_VAULT_NAME="kv-production-secrets"

az keyvault set-policy \
  --name "$KEY_VAULT_NAME" \
  --object-id "$(az identity show --name $IDENTITY_NAME --resource-group $RESOURCE_GROUP --query principalId --output tsv)" \
  --secret-permissions get list

az role assignment create \
  --role "Key Vault Secrets User" \
  --assignee "$USER_ASSIGNED_CLIENT_ID" \
  --scope "$(az keyvault show --name $KEY_VAULT_NAME --query id --output tsv)"

Use RBAC authorization on Key Vault (az keyvault update --enable-rbac-authorization true) rather than access policies when possible — RBAC integrates with Azure Monitor and supports deny assignments.

Configuring the Kubernetes Side

Create the service account with the client ID annotation:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: keyvault-reader
  namespace: production
  annotations:
    azure.workload.identity/client-id: "YOUR_USER_ASSIGNED_CLIENT_ID"
kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  name: keyvault-reader
  namespace: production
  annotations:
    azure.workload.identity/client-id: "${USER_ASSIGNED_CLIENT_ID}"
EOF

The webhook uses this annotation to populate AZURE_CLIENT_ID in the pod. If you need to target a specific tenant (cross-tenant federation), add azure.workload.identity/tenant-id as well.

Deploy the workload with the label that triggers webhook injection:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: secret-consumer
  namespace: production
spec:
  replicas: 2
  selector:
    matchLabels:
      app: secret-consumer
  template:
    metadata:
      labels:
        app: secret-consumer
        azure.workload.identity/use: "true"
    spec:
      serviceAccountName: keyvault-reader
      containers:
        - name: app
          image: myregistry.azurecr.io/secret-consumer:1.4.0
          env:
            - name: AZURE_KEYVAULT_URL
              value: "https://kv-production-secrets.vault.azure.net/"
          securityContext:
            runAsNonRoot: true
            runAsUser: 1000
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL

After the webhook runs, kubectl describe pod on a running pod will show the injected environment variables and the projected volume:

Environment:
  AZURE_CLIENT_ID:             <client-id>
  AZURE_TENANT_ID:             <tenant-id>
  AZURE_FEDERATED_TOKEN_FILE:  /var/run/secrets/azure/tokens/azure-identity-token
  AZURE_AUTHORITY_HOST:        https://login.microsoftonline.com/
Volumes:
  azure-identity-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3600

SDK Support: DefaultAzureCredential

The Azure SDK DefaultAzureCredential authentication chain checks for workload identity before falling through to other credential types. In Python:

from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient

credential = DefaultAzureCredential()
client = SecretClient(
    vault_url="https://kv-production-secrets.vault.azure.net/",
    credential=credential
)

secret = client.get_secret("database-connection-string")
print(secret.value)

In Go:

import (
    "github.com/Azure/azure-sdk-for-go/sdk/azidentity"
    "github.com/Azure/azure-sdk-for-go/sdk/security/keyvault/azsecrets"
)

cred, err := azidentity.NewDefaultAzureCredential(nil)
if err != nil {
    log.Fatalf("failed to create credential: %v", err)
}

client, err := azsecrets.NewClient(
    "https://kv-production-secrets.vault.azure.net/",
    cred,
    nil,
)

secret, err := client.GetSecret(ctx, "database-connection-string", "", nil)

DefaultAzureCredential detects the AZURE_FEDERATED_TOKEN_FILE environment variable, reads the projected token from the path, and uses WorkloadIdentityCredential internally. The token file is re-read on each request, so token rotation is transparent to the application.

For production use, prefer WorkloadIdentityCredential explicitly rather than DefaultAzureCredential to avoid accidental fallthrough to other credential sources (managed identity, Azure CLI, environment variable secrets):

from azure.identity import WorkloadIdentityCredential

credential = WorkloadIdentityCredential()

Terraform Automation

Managing federated credentials manually does not scale. Use the azurerm Terraform provider:

data "azurerm_kubernetes_cluster" "production" {
  name                = "aks-production"
  resource_group_name = "rg-production"
}

resource "azurerm_user_assigned_identity" "keyvault_reader" {
  name                = "aks-keyvault-reader"
  resource_group_name = azurerm_resource_group.production.name
  location            = azurerm_resource_group.production.location
}

resource "azurerm_federated_identity_credential" "keyvault_reader" {
  name                = "aks-production-keyvault-reader"
  resource_group_name = azurerm_resource_group.production.name
  parent_id           = azurerm_user_assigned_identity.keyvault_reader.id

  issuer   = data.azurerm_kubernetes_cluster.production.oidc_issuer_url
  subject  = "system:serviceaccount:production:keyvault-reader"
  audience = ["api://AzureADTokenExchange"]
}

resource "azurerm_role_assignment" "keyvault_reader_secrets" {
  scope                = azurerm_key_vault.production.id
  role_definition_name = "Key Vault Secrets User"
  principal_id         = azurerm_user_assigned_identity.keyvault_reader.principal_id
}

The oidc_issuer_url attribute on the azurerm_kubernetes_cluster data source returns the issuer URL with the trailing slash. Pass it directly to azurerm_federated_identity_credential.issuer — do not modify it.

For the Kubernetes side, use the Kubernetes Terraform provider alongside:

resource "kubernetes_service_account_v1" "keyvault_reader" {
  metadata {
    name      = "keyvault-reader"
    namespace = "production"
    annotations = {
      "azure.workload.identity/client-id" = azurerm_user_assigned_identity.keyvault_reader.client_id
    }
  }
}

This keeps the client ID reference out of static YAML and ensures the service account annotation is always in sync with the managed identity.

Debugging

Inspect the projected token: The token at /var/run/secrets/azure/tokens/azure-identity-token is a standard JWT. Decode it to verify the claims:

kubectl exec -n production deploy/secret-consumer -- \
  cat /var/run/secrets/azure/tokens/azure-identity-token | \
  cut -d. -f2 | base64 -d 2>/dev/null | jq .

Check the critical claims:

  • iss: Must match the AKS OIDC issuer URL exactly, including the trailing slash
  • sub: Must match system:serviceaccount:<namespace>:<serviceaccount> exactly as registered in the federated credential
  • aud: Must contain api://AzureADTokenExchange
  • exp: The Unix timestamp when the token expires; the kubelet rotates before this

Verify webhook injection: If the pod is missing the environment variables, the webhook did not run:

kubectl get pod -n production -l app=secret-consumer -o jsonpath='{.items[0].spec.containers[0].env}' | jq .

Check that the azure.workload.identity/use: "true" label is on the pod template (not just the deployment metadata), and that the webhook is running:

kubectl get pods -n azure-workload-identity-system
kubectl get mutatingwebhookconfigurations azure-wi-webhook-mutating-webhook-configuration

Check webhook namespace selector: The webhook has a namespace selector. Verify the target namespace has the required label:

kubectl get namespace production --show-labels

If the namespace is missing the label the webhook targets, add it:

kubectl label namespace production azure.workload.identity/use=enabled

Test the token exchange manually:

TOKEN=$(kubectl exec -n production deploy/secret-consumer -- \
  cat /var/run/secrets/azure/tokens/azure-identity-token)

TENANT_ID=$(az account show --query tenantId --output tsv)
CLIENT_ID="$USER_ASSIGNED_CLIENT_ID"

curl -X POST "https://login.microsoftonline.com/${TENANT_ID}/oauth2/v2.0/token" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "grant_type=urn:ietf:params:oauth:grant-type:jwt-bearer" \
  -d "client_id=${CLIENT_ID}" \
  -d "client_assertion_type=urn:ietf:params:oauth:client-assertion-type:jwt-bearer" \
  -d "client_assertion=${TOKEN}" \
  -d "scope=https://vault.azure.net/.default" \
  -d "requested_token_use=on_behalf_of"

A successful response returns an access token. An error response from Azure AD will include an error code and description indicating the failure reason.

Azure AD sign-in logs: In the Azure Portal under Entra ID > Monitoring > Sign-in logs, filter by the application name. Failed token exchanges appear here with detailed error codes. Common error codes:

  • AADSTS70021: No matching federated identity record — the subject or issuer in the token does not match any registered federated credential
  • AADSTS70022: Token is before nbf claim — clock skew between the cluster and Azure AD exceeds tolerance
  • AADSTS700016: Application not found — the client ID is wrong or belongs to a different tenant

Common Pitfalls

OIDC issuer URL mismatch: The most common failure. The issuer URL registered in the federated credential must be character-for-character identical to the iss claim in the projected token. Trailing slashes matter. Copy the issuer URL using az aks show --query oidcIssuerProfile.issuerUrl rather than constructing it manually.

Wrong audience claim: The projected token must have api://AzureADTokenExchange as its audience. The webhook sets this automatically, but if you create a projected token manually or use a different audience for other purposes, the exchange will fail with AADSTS70021. Do not reuse the same projected token for both the Kubernetes API server (https://kubernetes.default.svc) and Azure AD federation.

Webhook not installed or not running: --enable-workload-identity in the az aks command installs the webhook. If you only enabled the OIDC issuer without enabling workload identity, the webhook does not exist. Check with kubectl get mutatingwebhookconfigurations and reinstall via az aks update --enable-workload-identity.

Service account annotation missing or wrong: If azure.workload.identity/client-id is missing from the service account, the webhook does not know which managed identity to use and will not inject AZURE_CLIENT_ID. The token is still projected, but the SDK cannot determine which identity to request a token for.

Subject mismatch from namespace or service account name change: If you rename the Kubernetes service account or move workloads to a different namespace, the subject string changes. The existing federated credential no longer matches. Create a new federated credential with the updated subject before renaming.

Node-level managed identity interfering: If the AKS node pool has a user-assigned managed identity with broad permissions, DefaultAzureCredential may fall through to ManagedIdentityCredential after workload identity succeeds but returns an error. Use WorkloadIdentityCredential directly to avoid ambiguity.

Key Vault firewall blocking the AKS egress IP: If Key Vault has network restrictions, ensure the AKS node subnet is in the Key Vault firewall allowlist, or use a private endpoint. Workload identity resolves the authentication problem but not network connectivity.

Auditing with Azure Monitor

Enable diagnostic logging on Key Vault to track every secret access:

az monitor diagnostic-settings create \
  --name "keyvault-audit" \
  --resource "$(az keyvault show --name kv-production-secrets --query id --output tsv)" \
  --logs '[{"category":"AuditEvent","enabled":true}]' \
  --workspace "$(az monitor log-analytics workspace show \
      --resource-group rg-production \
      --workspace-name law-production \
      --query id --output tsv)"

Query secret access by service account identity in Log Analytics:

AzureDiagnostics
| where ResourceType == "VAULTS"
| where OperationName == "SecretGet"
| where ResultType == "Success"
| extend CallerObjectId = tostring(identity_claim_oid_g)
| extend CallerSubject = tostring(identity_claim_sub_s)
| summarize count() by CallerObjectId, CallerSubject, bin(TimeGenerated, 1h)
| order by TimeGenerated desc

The identity_claim_sub_s field will contain the system:serviceaccount:namespace:name string, making it straightforward to correlate Azure resource access back to specific Kubernetes workloads.

For AKS control plane audit logs, enable the kube-audit and kube-audit-admin log categories:

az aks update \
  --resource-group rg-production \
  --name aks-production \
  --enable-addons monitoring

Or configure diagnostic settings directly on the AKS resource to send logs to Log Analytics, then query for service account token projections and webhook mutations.