A logic flaw hiding in the Linux kernel since 2017 just became public. For anyone running shared infrastructure with multiple users, the clock is ticking.
On April 29, 2026, security researchers at Theori publicly disclosed CVE-2026-31431, a vulnerability they named “Copy Fail.” The name understates the severity. A 732-byte Python script, using nothing but standard library modules, can escalate any unprivileged local user to full root on virtually every Linux distribution shipped in the last nine years. No race conditions. No per-distribution tuning. No compiled payloads. The same script, unmodified, delivers root on Ubuntu, RHEL, Amazon Linux, SUSE, and every derivative in between.
Within 24 hours of disclosure, six independent weaponized exploits appeared on GitHub, including a portable C reimplementation that compiles to any architecture. This is not a theoretical risk. This is an active, accessible, and trivially exploitable vulnerability.
At ZENDATA, we triggered our emergency response process within hours of the disclosure. Here is what we found, what we did, and what you need to do.
What Copy Fail Actually Does
The vulnerability sits at the intersection of three kernel components that were each reasonable in isolation but lethal in combination.
The Linux kernel exposes a cryptographic API to unprivileged users through a socket interface called AF_ALG. No special permissions are required to use it. When a user combines this interface with the splice() system call, the kernel passes file data by reference, not by copy. The kernel’s in-memory representation of the file (the “page cache”) is handed directly to the cryptographic subsystem.
A 2017 optimization made the cryptographic operation work “in-place,” meaning the same memory buffer served as both input and output. One specific algorithm, authencesn (used internally by IPsec), writes 4 bytes past its declared output boundary as a scratch operation. Because of the in-place design, that scratch write lands directly into the page cache of whatever file the attacker chose to splice.
The attacker controls three things: which file is targeted, which 4-byte offset within that file is overwritten, and what value is written. By repeating this operation approximately 40 times, an attacker can inject shellcode into the cached copy of a setuid binary like /usr/bin/su. When that binary is executed, the kernel loads the corrupted page cache version, and the shellcode runs as root.
The file on disk is never modified. The kernel never marks the corrupted page as “dirty.” Every file integrity monitoring tool that checks on-disk hashes, including AIDE, Tripwire, Wazuh, and OSSEC, sees nothing wrong. AuditD file watches see nothing. The attack is entirely memory-resident.
Why Multi-Tenant Infrastructure Is the Real Target
Mandiant rated this vulnerability LOW. That rating assumes a threat model where the attacker must first obtain local access to a single-user system. In that narrow scenario, local privilege escalation is a secondary concern.
That threat model does not reflect reality for most of our clients.
The environments where Copy Fail is genuinely critical are the ones where multiple users, tenants, or workloads share a Linux kernel. This includes shared development servers, jump hosts, and bastion systems where multiple employees or contractors have shell access. It includes CI/CD runners, where a malicious pull request executes code as a regular user on shared build infrastructure. It includes Kubernetes clusters, where the page cache is shared across all pods on a node, making Copy Fail a container escape primitive. It includes SaaS platforms, notebook hosts, and agent sandboxes that execute tenant-supplied code.
In these environments, the “local access” prerequisite is not a barrier. It is the normal operating condition. Every user on a shared system, every container on a Kubernetes node, every CI job on a runner already has the access Copy Fail requires. The jump from “valid user” to “root” takes less than a second.
Recent testing by independent researchers confirmed that Kubernetes Pod Security Standards set to “Restricted” with RuntimeDefault seccomp profiles do not block AF_ALG socket creation. A non-root pod, with all Linux capabilities dropped, admitted under the strictest standard Kubernetes offers out of the box, can still reach the vulnerable kernel path. Default container runtime seccomp profiles, whether from Docker, containerd, or CRI-O, do not block address family 38.
For organizations running mutualized infrastructure with untrusted but valid users, this is not a LOW. This is an emergency.
How to Protect Yourself
1. Patch the kernel
The upstream fix (commit a664bf3d603d) reverts the 2017 in-place optimization, permanently separating the page cache pages from the writable scatterlist. Apply your distribution’s kernel update and reboot. If zero-downtime is required, kpatch (RHEL) and Canonical Livepatch (Ubuntu) support live kernel patching.
Check your running kernel version first. A common mistake is installing the patched kernel package but continuing to run the old kernel. After updating, verify with uname -r that the active kernel includes the fix.
2. Deploy the module blacklist immediately (pre-patch)
If you cannot patch and reboot today, disable the vulnerable module:
echo "install algif_aead /bin/false" > /etc/modprobe.d/disable-algif-aead.conf
rmmod algif_aead 2>/dev/null || true
This prevents the kernel from loading the module that exposes the vulnerable code path. The impact on production workloads is negligible. dm-crypt, LUKS, kTLS, IPsec, OpenSSL, GnuTLS, NSS, SSH, and the kernel keyring all use the kernel crypto API directly and do not go through AF_ALG. The only software affected is the rare application that explicitly enables the OpenSSL afalg engine or binds AF_ALG sockets directly. Verify exposure with:
lsof | grep AF_ALG
ss -xa | grep alg
3. Block AF_ALG at the container level
For Kubernetes clusters, container platforms, and CI runners, deploy a custom seccomp profile that blocks socket creation with address family 38, regardless of patch status:
{
"defaultAction": "SCMP_ACT_ALLOW",
"syscalls": [{
"names": ["socket"],
"action": "SCMP_ACT_ERRNO",
"args": [{"index": 0, "value": 38, "op": "SCMP_CMP_EQ"}]
}]
}
Enforce this cluster-wide through an admission controller such as Kyverno or OPA Gatekeeper. Do not rely on RuntimeDefault or PSS Restricted alone.
4. Inventory your SUID binaries
Every setuid-root binary on your systems is a potential target for Copy Fail’s page cache corruption. Run the following across your fleet and evaluate whether each binary is necessary:
find / -perm -4000 -type f
Remove the setuid bit from anything that is not required. This reduces the attack surface not just for Copy Fail but for any future local privilege escalation vulnerability.
How ZENDATA’s SOC Detected and Responded
When CVE-2026-31431 was publicly disclosed, our SOC operations team immediately assessed the exposure across our managed client fleet and activated our emergency advisory process.
Detection rules deployed within hours
Our security stack part of our ZEN360 managed SOC service, was updated with layered detection rules targeting every stage of the Copy Fail exploit chain.
Behavioral detection (highest fidelity). We deployed an EQL rule based on a kernel-level invariant. When a setuid binary is executed, the Linux kernel inherits the real user ID (ruid) from the calling process. Only the effective user ID (euid) is elevated to the file owner. Under normal circumstances, it is impossible for the new process to have ruid=0 if the parent process had ruid!=0. The only way this occurs is if the new process called setuid(0) after starting, which is exactly what Copy Fail’s injected shellcode does. This single rule catches every Copy Fail variant, regardless of payload, programming language, or target binary, with near-zero false positives.
Syscall-level monitoring. We deployed AuditD rules across managed hosts to track AF_ALG socket creation (address family 38) and splice() calls by non-root users. AF_ALG usage by unprivileged processes is exceptionally rare in production environments. Any occurrence is treated as a high-confidence indicator of exploitation activity. Our correlation logic flags bursts of AF_ALG socket creation from the same user within a short time window, matching the exploit’s pattern of approximately 40 iterations to deliver a complete payload.
Exploit artifact detection. Rules were deployed to detect the known proof-of-concept files by name and hash, as well as download attempts targeting the published exploit URLs. While these are lower-fidelity indicators (an attacker can rename files and modify hashes), they provide useful coverage against opportunistic exploitation using the public tooling.
Osquery fleet scanning. Scheduled queries were pushed to all managed endpoints to identify vulnerable kernel versions, check whether the algif_aead module is loaded, verify whether the module blacklist workaround has been applied, and inventory setuid binaries across the fleet. This gave us and our clients a complete exposure picture within hours.
The Bigger Picture
Copy Fail was discovered by an AI-assisted code auditing tool. Theori’s Xint Code platform scanned the Linux kernel’s crypto subsystem and surfaced this vulnerability in approximately one hour of automated analysis. The same scan found additional high-severity bugs that are still in coordinated disclosure.
This changes the economics of vulnerability discovery in a way that matters for every security program. The assumption that kernel-grade zero-days are expensive to find, and therefore rare, is no longer reliable. If automated tools can surface vulnerabilities of this caliber in an hour, the supply of new critical bugs will increase, and the window between discovery and exploitation will compress.
For defenders, the practical implications are straightforward. Patch cycles need to be faster. Detection engineering needs to cover behavioral indicators, not just known signatures. And shared-kernel multi-tenancy, which was never designed to be a security boundary, needs to be treated with the skepticism it deserves. Containers are a packaging and isolation convenience, not a trust boundary. If your security model depends on namespace isolation holding against a kernel-level primitive, Copy Fail is the proof that it does not.
At ZENDATA, we design our managed SOC services around exactly this reality: layered detection, rapid response, and the operational depth to turn a vulnerability disclosure into a protected client fleet within hours, not weeks.
If you are unsure whether your infrastructure is exposed, or if you need support deploying detection and mitigation for CVE-2026-31431, contact our team. We have the rules, the tooling, and the operational experience to close this gap.
