Loki Security Hardening: Authentication, Tenant Isolation, and Log Tampering Prevention
Problem
Loki is a horizontally scalable log aggregation system. Its pull-based label model and integration with Grafana make it the default log backend for Kubernetes-native stacks. The same properties that make it operationally convenient create security risks:
- No built-in authentication. Loki’s default configuration accepts all requests without authentication. Any process that can reach the Loki HTTP port (3100) can push logs under any tenant ID and query any tenant’s logs. This is by design for single-tenant deployments but is actively dangerous when Loki is exposed within a multi-team cluster.
- Tenant ID spoofing. Loki’s multi-tenancy model relies on the
X-Scope-OrgIDHTTP header to identify the tenant. Without authentication and authorisation, any Promtail agent or application can set this header to any value — including another tenant’s ID — and read or write to that tenant’s log stream. - Log injection via unvalidated labels. Loki stores logs indexed by labels. Labels come from Promtail’s configuration, which sources them from Kubernetes pod annotations, node labels, and log stream metadata. An attacker who can set a pod annotation can inject arbitrary labels into the log stream, potentially poisoning label-based alert queries.
- No log integrity protection. Loki is an append-only system by design, but its storage backend (S3, GCS, filesystem) may allow deletion or modification if the storage access controls are not configured correctly. A compromised Loki instance or storage credential can retroactively delete log evidence.
- Ruler (alert) rule injection. Loki’s ruler component evaluates LogQL alert rules. A user with access to create ruler rules can query any tenant’s logs (if multi-tenancy is bypassed) or execute expensive queries that exhaust Loki’s query frontend resources.
Target systems: Loki 3.x (microservices and single-binary modes); Promtail 3.x; Grafana Alloy as Loki client; S3/GCS object storage backends; Kubernetes namespace-based multi-tenancy.
Threat Model
- Adversary 1 — Unauthenticated log read: An attacker who reaches the Loki HTTP endpoint queries all logs using
{job=~".+"}. Without authentication, they read all log streams — including logs containing credentials, PII, and security events. - Adversary 2 — Cross-tenant log access (tenant ID spoofing): A developer’s application sets
X-Scope-OrgID: paymentsin its Loki push requests, accessing the payments team’s log namespace. Without server-side tenant authorisation, this succeeds. - Adversary 3 — Log poisoning via label injection: An attacker who can annotate a pod sets
loki.example.com/streamto a value that matches a critical alert rule’s label selector, injecting synthetic log events that trigger false security alerts — or suppress legitimate ones by flooding the alert query. - Adversary 4 — Log deletion from storage: An attacker who obtains S3 credentials for the Loki storage bucket deletes log chunks, destroying forensic evidence of their previous activity.
- Adversary 5 — Resource exhaustion via expensive queries: A user with query access submits an unbounded LogQL query (
{job=~".+"} |= ""over a 30-day range). The query exhausts Loki’s querier memory, causing OOM and service disruption for all tenants. - Access level: Adversaries 1 and 5 need network access to Loki. Adversary 2 needs the ability to push logs. Adversary 3 needs pod annotation access. Adversary 4 needs storage credentials.
- Objective: Extract sensitive log data, inject false log evidence, destroy audit trail, deny log service.
- Blast radius: Full log read access exposes all application logs across all teams — a significant data breach if logs contain PII, credentials, or security event data.
Configuration
Step 1: Authentication via Reverse Proxy
Loki does not implement authentication natively. Enforce it at the gateway layer:
# nginx reverse proxy in front of Loki — OAuth2 Proxy for SSO.
# docker-compose or Kubernetes deployment.
# nginx.conf for Loki gateway.
upstream loki {
server loki:3100;
}
server {
listen 443 ssl;
server_name loki.internal.example.com;
ssl_certificate /etc/ssl/loki.crt;
ssl_certificate_key /etc/ssl/loki.key;
# Require authentication via OAuth2 Proxy sidecar.
location / {
auth_request /oauth2/auth;
error_page 401 = /oauth2/sign_in;
auth_request_set $user $upstream_http_x_auth_request_user;
auth_request_set $groups $upstream_http_x_auth_request_groups;
proxy_set_header X-Auth-Request-User $user;
proxy_set_header X-Auth-Request-Groups $groups;
# Set tenant ID from authenticated user's group.
# This prevents spoofing — the tenant is set by auth, not by the client.
proxy_set_header X-Scope-OrgID $user;
proxy_pass http://loki;
}
location /oauth2/ {
proxy_pass http://oauth2-proxy:4180;
}
}
For Kubernetes, use a dedicated auth gateway:
# Grafana Enterprise or Grafana Cloud provides built-in auth for Loki.
# For self-hosted, use the Loki gateway with HTTP basic auth or mTLS.
apiVersion: v1
kind: ConfigMap
metadata:
name: loki-gateway-config
namespace: monitoring
data:
nginx.conf: |
http {
upstream loki {
server loki-read:3100;
}
upstream loki-write {
server loki-write:3100;
}
server {
listen 80;
location = /loki/api/v1/push {
# Write endpoint: authenticated Promtail agents only.
# mTLS: require client certificate.
proxy_pass http://loki-write;
proxy_set_header X-Scope-OrgID $http_x_scope_orgid;
}
location /loki/api/ {
# Read endpoint: authenticated users only.
auth_basic "Loki";
auth_basic_user_file /etc/nginx/htpasswd;
proxy_pass http://loki;
}
}
}
Step 2: Multi-Tenant Configuration
# loki-config.yaml — multi-tenancy configuration.
auth_enabled: true # CRITICAL: enables X-Scope-OrgID header enforcement.
# Without this, all logs are stored under tenant "fake".
server:
http_listen_port: 3100
grpc_listen_port: 9095
# Bind to localhost only — gateway handles external access.
http_listen_address: 127.0.0.1
grpc_listen_address: 127.0.0.1
# Tenant-level limits — prevent one tenant from consuming all resources.
limits_config:
# Per-tenant ingestion limits.
ingestion_rate_mb: 10 # 10 MB/s per tenant.
ingestion_burst_size_mb: 20 # Burst up to 20 MB.
max_streams_per_user: 10000 # Max active streams per tenant.
max_line_size: 256kb # Max log line size.
# Per-tenant query limits.
max_query_length: 12h # Max time range for a query.
max_query_parallelism: 8 # Max parallel sub-queries per tenant.
max_entries_limit_per_query: 50000 # Max log entries returned per query.
query_timeout: 300s # Kill queries running over 5 minutes.
max_cache_freshness_per_query: 10m
# Retention per tenant (configurable per tenant via per-tenant overrides).
retention_period: 720h # 30 days default.
# Per-tenant overrides — security team gets longer retention.
# overrides.yaml (separate file, reloaded without restart).
overrides:
security-team:
retention_period: 8760h # 1 year for security logs.
max_query_length: 720h # Allow longer historical queries.
payments:
ingestion_rate_mb: 50 # Higher limit for high-volume service.
Step 3: Promtail Security
# promtail-config.yaml — harden log collection agent.
server:
http_listen_port: 9080
# Bind to localhost; don't expose Promtail HTTP on host interface.
http_listen_address: 127.0.0.1
clients:
- url: https://loki-gateway.monitoring:443/loki/api/v1/push
tenant_id: "${TENANT_ID}" # Set from environment; not from log metadata.
# mTLS: authenticate Promtail to the Loki gateway.
tls_config:
cert_file: /etc/promtail/client.crt
key_file: /etc/promtail/client.key
ca_file: /etc/promtail/ca.crt
# Backoff and retry.
backoff_config:
max_retries: 10
min_period: 500ms
max_period: 5m
scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
pipeline_stages:
# Scrub PII before shipping.
- replace:
expression: '(\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b)'
replace: '[CARD-REDACTED]'
- replace:
expression: '(\b\d{3}-\d{2}-\d{4}\b)'
replace: '[SSN-REDACTED]'
# Drop logs from known-noisy, low-value sources.
- drop:
source: "job"
value: "kube-system/coredns"
drop_counter_reason: "coredns-filtered"
# Label allowlist — only allow specific pod labels as Loki labels.
# Prevents label injection via arbitrary pod annotations.
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
target_label: app
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_pod_container_name]
target_label: container
# DO NOT include: __meta_kubernetes_pod_annotation_* without explicit allowlist.
# Pod annotations are attacker-controlled if they can annotate pods.
Step 4: Storage Backend Security
# S3 backend: prevent log deletion.
# S3 Object Lock — COMPLIANCE mode prevents deletion even by root.
aws s3api create-bucket \
--bucket loki-logs-prod \
--region us-east-1 \
--object-lock-enabled-for-bucket
aws s3api put-object-lock-configuration \
--bucket loki-logs-prod \
--object-lock-configuration '{
"ObjectLockEnabled": "Enabled",
"Rule": {
"DefaultRetention": {
"Mode": "COMPLIANCE",
"Days": 30
}
}
}'
# IAM policy for Loki: write and read, but no delete.
aws iam put-role-policy \
--role-name loki-service-role \
--policy-name loki-s3-access \
--policy-document '{
"Statement": [{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::loki-logs-prod",
"arn:aws:s3:::loki-logs-prod/*"
]
}]
}'
# Absent: s3:DeleteObject — Loki cannot delete its own logs.
# loki-config.yaml — S3 storage configuration.
storage_config:
aws:
s3: "s3://loki-logs-prod"
region: us-east-1
sse_encryption: true # SSE-KMS.
s3forcepathstyle: true
http_config:
idle_conn_timeout: 90s
response_header_timeout: 0
# Chunk store.
boltdb_shipper:
active_index_directory: /var/loki/index
cache_location: /var/loki/cache
shared_store: s3
Step 5: Ruler Security (Alert Rules)
# Restrict who can create ruler rules.
# Ruler rules can query all logs in the tenant — treat as privileged access.
# loki-config.yaml
ruler:
storage:
type: s3
s3:
bucketnames: loki-ruler-prod
rule_path: /tmp/loki/rules
enable_api: true
enable_alertmanager_v2: true
# Ruler API access: only allow platform team to create/modify rules.
# Enforce via the reverse proxy:
# location /loki/api/v1/rules {
# # Only allow platform team (from auth header group).
# if ($http_x_auth_request_groups !~ "platform-team") {
# return 403;
# }
# proxy_pass http://loki;
# }
Step 6: Log Integrity Verification
#!/bin/bash
# verify-log-integrity.sh — detect gaps or deletions in log streams.
TENANT="security-team"
START=$(date -u --date="24 hours ago" +%s%N)
END=$(date -u +%s%N)
# Query Loki for log count in the past 24 hours.
LOG_COUNT=$(curl -s \
-H "X-Scope-OrgID: $TENANT" \
-G "https://loki.internal/loki/api/v1/query" \
--data-urlencode "query=count_over_time({job=\"security-events\"}[24h])" \
--data-urlencode "time=$END" | \
jq -r '.data.result[0].value[1] // "0"')
echo "Security event log count (24h): $LOG_COUNT"
# Compare with expected baseline.
EXPECTED_MINIMUM=1000
if [ "$LOG_COUNT" -lt "$EXPECTED_MINIMUM" ]; then
logger -p security.alert -t loki-integrity \
"ALERT: Security log count $LOG_COUNT below minimum $EXPECTED_MINIMUM"
fi
Step 7: Kubernetes Deployment Security
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: loki
namespace: monitoring
spec:
template:
spec:
securityContext:
runAsNonRoot: true
runAsUser: 10001
fsGroup: 10001
seccompProfile:
type: RuntimeDefault
containers:
- name: loki
image: grafana/loki:3.0.0@sha256:abc123
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
resources:
limits:
cpu: "4"
memory: "8Gi"
requests:
cpu: "500m"
memory: "1Gi"
volumeMounts:
- name: config
mountPath: /etc/loki
readOnly: true
- name: data
mountPath: /var/loki
Step 8: Telemetry
loki_ingester_streams_created_total{tenant} counter
loki_request_duration_seconds{route, tenant, status_code} histogram
loki_ingestion_rate_bytes{tenant} gauge
loki_query_length_hours{tenant} histogram
loki_tenant_storage_bytes{tenant} gauge
loki_ruler_rules_total{tenant} gauge
loki_rejected_samples_total{tenant, reason} counter
Alert on:
loki_rejected_samples_total{reason="rate_limit_exceeded"}— a tenant is hitting ingestion rate limits; may indicate a logging misconfiguration or a denial-of-service attempt.- Unexpected
loki_tenant_storage_bytesdecrease — log chunks may have been deleted; investigate S3 access logs. loki_query_length_hoursP99 > query_timeout — queries timing out; either increase timeout or investigate expensive query patterns.- Security log count drops below baseline (via integrity check script) — logs may have been deleted or the shipping pipeline is broken.
- Any
DELETEAPI calls to S3 loki bucket (via CloudTrail) — should never happen given IAM policy.
Expected Behaviour
| Signal | Default Loki (no auth) | Hardened Loki |
|---|---|---|
| Anonymous log query | All logs readable | 401; authentication required |
| Tenant ID spoofing | Any client sets any tenant | Gateway enforces tenant from authenticated identity |
| Log deletion from S3 | Possible with default IAM | Object Lock prevents deletion; IAM blocks DeleteObject |
| Expensive query exhausts querier | No per-tenant limits | max_query_length and query_timeout kill runaway queries |
| Pod annotation label injection | Arbitrary labels added to log stream | Label allowlist in Promtail relabel config |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| auth_enabled = true | Tenant isolation enforced server-side | All clients must send X-Scope-OrgID | Set consistently in Promtail config; enforce in gateway |
| S3 Object Lock (COMPLIANCE) | Logs cannot be deleted | Cannot delete even erroneous data until retention expires | Test retention policy; use shorter window if needed |
| Per-tenant query limits | Prevents resource exhaustion | Legitimate historical queries may time out | Increase limit for specific tenants via overrides |
| mTLS for Promtail-to-Loki | Authenticates log shipper | Certificate management for each Promtail instance | cert-manager automates per-node certificates |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| auth_enabled misconfiguration | All logs stored under “fake” tenant | Unexpected {orgID="fake"} label in queries |
Enable auth_enabled; all historical data under “fake” is not migrated |
| Rate limit too low | Legitimate high-volume service loses logs | loki_rejected_samples_total alert |
Increase rate limit for specific tenant via overrides.yaml |
| Promtail certificate expiry | Logs stop shipping; cert error | loki_ingestion_rate_bytes drops to 0 for tenant |
cert-manager auto-renewal; alert at 14 days remaining |
| Query frontend OOM from expensive query | Query service crashes; all queries fail | Query latency spike then 503 | Reduce max_query_parallelism; add memory limit to query frontend |
| S3 backend outage | Log ingestion buffer fills; eventual write failure | Loki error logs; ingestion rate alert | Loki buffers in WAL; writes complete once S3 recovers |