2026breachPublic

Copy Fail — 732 Bytes to Root on Every Linux Distribution

A deterministic, race-free 4-byte write into the kernel page cache — exploitable by any unprivileged local user — provided a reliable root shell on Ubuntu, Amazon Linux, RHEL, and SUSE. The same 732-byte Python script worked unmodified on every tested distribution. No kernel-specific offsets. No races. No crashes. The corrupted page was never marked dirty, so on-disk integrity tools were blind to the modification. The primitive also crosses container boundaries, constituting a Kubernetes node escape vector.

5 min read

Root Cause

Three independently reasonable changes collided after a decade of dormancy. In 2011, the authencesn AEAD wrapper was added for IPsec ESN support; it used the caller's destination scatterlist as scratch space during decryption, writing 4 bytes beyond the expected output boundary — harmless at the time. In 2015, AF_ALG gained AEAD support with a splice() path that could route page-cache pages directly into the crypto subsystem's input scatterlist. In 2017, an optimization made AEAD operations in-place by chaining those splice() page-cache pages into the writable destination scatterlist and setting req->src = req->dst. Nobody connected the 2017 in-place optimization to authencesn's out-of-bounds scratch write or to the splice() path's use of live page-cache pages. The vulnerability lived silently at the intersection of all three for nearly nine years.

Aftermath

Patch committed to mainline Linux on 2026-04-01 (commit a664bf3d603d), reverting the 2017 algif_aead in-place optimization entirely. CVE-2026-31431 assigned on 2026-04-22. Immediate mitigation: blacklist the algif_aead kernel module. Part 2 of the Xint disclosure (pending) documents the Kubernetes container-escape path. The finding was AI-assisted, surfaced by Xint Code from an operator prompt describing the splice()/page-cache attack surface — demonstrating that AI-assisted vulnerability research can now scale human intuition across entire kernel subsystems.

The Incident

On April 29, 2026, Xint Code publicly disclosed CVE-2026-31431, named Copy Fail. A single 732-byte Python script — using only the standard library — reliably produced a root shell on every major Linux distribution shipped since 2017: Ubuntu, Amazon Linux, RHEL, and SUSE. No race condition. No per-distribution kernel offsets. No recompilation. The same script, unchanged, obtained root on four different kernels across three kernel lineages (6.12, 6.17, 6.18) in a single uncut demo.

The Root Cause

Three separately reasonable engineering decisions — made in 2011, 2015, and 2017 — converged into a nine-year-old exploitable hole.

2011 — authencesn's scratch write. The authencesn cryptographic wrapper was added to support IPsec Extended Sequence Numbers. For HMAC computation, it needs to rearrange 8 bytes of associated data. It does this by using the caller's destination scatterlist as temporary scratch space — including a 4-byte write at offset assoclen + cryptlen, past the intended output boundary. At the time, authencesn was only called by the kernel's internal IPsec stack with user-controlled memory. The out-of-bounds scratch write was invisible.

2015 — AF_ALG gains splice() support. The kernel crypto API gained a userspace socket interface, AF_ALG, available to unprivileged users. The splice() system call was wired to deliver page-cache pages by reference into the crypto subsystem's input scatterlist — meaning the pages backing a readable file (including a setuid binary) could be routed directly into kernel crypto operations without copying.

2017 — in-place optimization closes the trap. An optimization in algif_aead.c made AEAD decryption operate in-place by chaining the splice()'d page-cache tag pages onto the writable destination scatterlist and setting req->src = req->dst. Now the live, kernel-owned page-cache pages of any readable file were sitting in the writable destination scatterlist. authencesn's scratch write at dst[assoclen + cryptlen] walked directly into those pages, writing 4 controlled bytes into the kernel's in-memory copy of the file.

The 4-byte write persisted after recvmsg() returned an error. The kernel never marked the page dirty — so the on-disk file remained untouched and all checksum-based integrity tools were blind. But the page cache is what execve() reads. A setuid binary with injected shellcode, loaded from a corrupted page cache, runs as UID 0.

The Exploit

The attacker controls three things: which file (any file readable by the current user, including /usr/bin/su), which 4-byte offset (determined by the splice offset, splice length, and assoclen), and which 4-byte value (bytes 4–7 of the attacker's AAD in sendmsg()). The loop iterates over the shellcode payload in 4-byte chunks. Each iteration triggers one sendmsg() + splice() + recv() cycle, writing one chunk. After all chunks are written, execve("/usr/bin/su") loads the corrupted page-cache version and runs the injected shellcode as root.

No compiled payloads. No kernel version checks. No timing windows. The total exploit payload: 732 bytes of Python.

The Stealth Problem

The write never passes through the VFS write path. The corrupted page is never marked dirty by the kernel's writeback machinery. Standard on-disk file integrity systems — rpm -V, dpkg --verify, aide, tripwire, any tool comparing on-disk checksums — all report the file as clean. Only the in-memory page cache is modified, and the page cache is what the running system actually uses.

The Container Escape

The Linux page cache is a host-global resource. Container boundaries do not partition it. A process inside a container that can read a setuid binary on the host (a common configuration) can corrupt its page-cache entry, affecting every other process on the node — inside and outside any container. Part 2 of the Xint disclosure details the Kubernetes cluster-level exploitation path.

The Fix

Commit a664bf3d603d reverts the 2017 in-place optimization entirely. req->src now points to the TX scatterlist (which may contain splice()'d page-cache pages) and req->dst points to the RX scatterlist (the user's recvmsg buffer). The sg_chain() call that linked page-cache pages into the writable destination is removed. As the commit message notes: "There is no benefit in operating in-place in algif_aead since the source and destination come from different mappings."

Why It Matters

Copy Fail is a case study in latent architectural debt. No single commit introduced a vulnerability — each was locally defensible. The bug emerged from the composition of three changes over six years, exploitable only when all three were present simultaneously. The authencesn scratch write was a silent invariant violation: the AEAD API assumed every implementation would confine writes to the declared output region, but documented no such requirement and enforced none. The 2017 optimization was a performance improvement that changed the provenance of pages in the writable scatterlist — without auditing every registered algorithm for out-of-bounds writes.

The nine-year window demonstrates that subsystem-level architectural assumptions can remain unaudited indefinitely when each individual component appears correct in isolation. The finding was surfaced in approximately one hour by an AI-assisted scanner given a single operator prompt describing the AF_ALG/splice()/page-cache attack surface — suggesting that entire classes of latent architectural violations may now be enumerable by machine at a scale no human review process has historically matched.

Techniques

page cache corruptionlogic flawprivilege escalationscatterlist aliasing