2026breachenterprise

McKinsey Lilli — The Prompt Layer Was Always the Target

46.5 million internal chat messages, 728,000 files, 57,000 user accounts, and 95 system prompts governing AI behavior for 43,000 McKinsey consultants were exposed. An autonomous AI agent running for two hours at a cost of $20 in tokens achieved full read access to Lilli's production database through a JSON key injection vulnerability undetected by automated scanners including OWASP ZAP. The 95 writable system prompts represented a secondary risk class with no prior industry category: behavioral poisoning at scale, where an attacker with write access could silently corrupt the AI's guardrails, financial model outputs, and strategic recommendations for the entire firm without any code deployment or log trail.

8 min read

Root Cause

Three compounding failures: (1) Behavioral configuration — 95 system prompts governing AI conduct — stored in the same database as operational data, sharing identical access controls and blast radius. (2) 22 unauthenticated API endpoints in production, each with a distinct origin: hotfixes whose auth was temporarily stripped under outage pressure and never restored, shadow probe endpoints created for live production testing and forgotten, and internal-only endpoints assumed protected by network segmentation that wasn't enforced. (3) A JSON key injection variant — SQL concatenated field names rather than parameterized values — that bypassed OWASP ZAP and two years of McKinsey's own internal scanning because no automated tool was testing that specific surface.

Aftermath

CodeWall researchers disclosed responsibly under McKinsey's published disclosure policy. McKinsey patched within 24 hours, engaged a forensics firm, and issued a public statement. The firm's investigation found no evidence of unauthorized access to client data beyond the research team. The broader industry implication: autonomous offensive agents are now autonomously reading legal disclosure policies and selecting legally defensible targets without human direction. The economics of offensive security reconnaissance changed permanently on February 28, 2026.

The Incident

The numbers made headlines. 46.5 million chat messages. 728,000 files. 57,000 user accounts. 95 system prompts governing how an AI behaved for every employee at one of the world's most powerful consulting firms.

Security media treated what happened to McKinsey's internal AI platform Lilli as a data breach story. It isn't. It's a design story. And the design decision that enabled it is fifty years old.

The Vulnerability Wasn't SQL Injection

SQL injection was the method. The failure was something more fundamental: behavioral configuration stored alongside operational data in the same database, sharing the same access controls, exposed to the same blast radius.

The 95 system prompts carried Lilli's behavioral rules — how to answer questions, which guardrails to apply, how to cite its sources. They were stored in the same database as everything else. That architectural choice transformed what would have been a data breach into something the security industry doesn't have a clean category for yet: behavioral poisoning at scale.

An attacker with write access through the same injection could have rewritten those prompts. Silently. No deployment needed. No code change. Just a single UPDATE statement wrapped in a single HTTP call. The implications for 43,000 McKinsey consultants relying on Lilli for client work: poisoned financial models, corrupted strategic recommendations, guardrails quietly removed. And unlike a compromised server, a modified prompt leaves no log trail.

This is the separation of concerns principle — introduced formally in software engineering in the 1970s by Dijkstra — collapsing in an AI-native context. Configuration is not data. Behavioral rules are not chat logs. Treating them as equivalent rows in a shared schema is a choice, and it's a choice that enterprise AI teams are making every day because the tooling makes it easy and the threat model hasn't caught up.

The Attack Was Methodical, Not Exotic

The CodeWall agent discovered that Lilli's API documentation was publicly accessible, revealing more than 200 endpoints — 22 of which required no authentication whatsoever. The search endpoint had parameterized its query values correctly, the standard defense against SQL injection. But the JSON keys — the field names themselves — were concatenated directly into SQL without sanitization. When the agent sent malformed key names, the database reflected them verbatim in its error messages. In just 15 blind iterations, it refined its injections until production records started flowing back.

OWASP ZAP — one of the most widely deployed web application security scanners — did not flag it. Lilli had been running in production for over two years and McKinsey's own internal scanners found nothing. The vulnerability class has been documented since 1998. The specific variant — injection through JSON key names rather than values — is simply a surface that most automated tools aren't looking at.

The Part Everyone Is Under-Discussing

CodeWall's researchers didn't aim their agent at McKinsey. The agent selected McKinsey autonomously, identifying that the firm had published a responsible disclosure policy — meaning any findings could be legally reported — and chose them accordingly. An offensive AI agent read legal disclosure frameworks and made a targeting decision without human involvement.

Autonomous offensive agents are now reading disclosure policies and selecting legally defensible targets. This isn't a research curiosity. It's a preview of how continuous automated reconnaissance will operate at scale. Your threat surface is visible to systems that don't sleep, don't get bored, and cost $20 in tokens to run for two hours against a firm with world-class security investment.

The economics of offensive security changed permanently on February 28, 2026.

Why 22 Endpoints — A Taxonomy of Production Debt

Production systems don't fail uniformly. They accumulate debt through specific decisions made under specific pressures at specific moments. The 22 unauthenticated endpoints almost certainly have multiple origins:

The Hotfix. Authentication stripped from a known endpoint to address a production break. Temporary — a ticket was opened, intent was clear. The ticket aged out. The door stayed open because the system kept working, which is the most dangerous kind of silence there is.

The Shadow Probe. An undocumented endpoint created for live production testing when the failure couldn't be reproduced in staging. The engineer created a testing surface jabbed directly into the running system, probed the failure conditions, fixed the issue, and moved on. The endpoint persists, unknown to the team, invisible to scanners that weren't pointed at it because nobody knew to point them there.

The Assumption. An internal-only endpoint assumed protected by network segmentation. The segmentation is not what the diagram says it is. Nobody knows because the endpoint kept working for its intended users.

McKinsey probably has 22 different stories for 22 different endpoints. That is somehow more disturbing than a single systemic failure — because it means the remediation isn't a single fix. It's an archaeological dig through years of production decisions made under pressure by engineers who may no longer be on the team.

The Scan Stack That Catches What Developers Forget

The right lesson is not "run better security scanners." It is that security scanning must be informed by multiple independent data sources — and must explicitly not trust developer memory or organizational documentation as a complete picture of the actual attack surface.

Developers are unreliable narrators of their own systems. Not from malice — from the structural conditions of production engineering. They forget because the system kept working and the ticket aged out. They omit because the shadow probe was temporary and then it wasn't. They inherited code they didn't write and documented what they understood, which wasn't everything.

The four layers that must be cross-referenced, not run in isolation:

Layer 1 — Publication Documentation Analysis. API docs, OpenAPI specs, developer portals. This is the declared surface — curated, and therefore incomplete. Things get removed from docs before they get removed from production.

Layer 2 — Code Analysis. Static analysis of the actual codebase. Route registration, middleware chains, auth decorator presence or absence, endpoint definitions that never made it into documentation. This is where the hotfix lives. Where //TODO: re-enable auth sits three lines above a handler that has been in production for eighteen months.

Layer 3 — Network Scanning. What is actually listening. Ports, services, internal endpoints exposed across network boundaries that were assumed to be private. This catches the architectural assumption failure — the endpoint that was never meant to be public but is.

Layer 4 — Application / Runtime Scanning. Dynamic testing against the live surface. What actually responds. What accepts input it shouldn't. This is where JSON key injection lives — behavior that only manifests under specific input conditions that static analysis will not find.

The CodeWall agent ran Layers 1, 3, and 4 autonomously. It did not have Layer 2 — the source code. If it had, it would have found the vulnerability in minutes rather than two hours, and found every other forgotten endpoint in the codebase simultaneously. That is the scan capability gap that enterprise security programs have not closed.

What Changed

McKinsey was not careless. They are a firm with world-class technology teams, significant security investment, and the resources to do things properly. Their response was commendable. Their investigation found no evidence that client data was accessed by unauthorized parties beyond the CodeWall researchers themselves.

The lesson is not that McKinsey failed. The lesson is that the architectural assumptions underlying their deployment — behavioral configuration co-located with operational data, a production environment that diverged from lower environments in ways that forced decisions that never got reversed, a security scanning posture that trusted the declared surface — are assumptions the entire industry is currently carrying.

Every enterprise AI deployment that stores system prompts in the same database as user data has replicated this architecture. Every team that has stripped auth from an endpoint to survive a production incident and opened a ticket they haven't closed is living in this failure mode. Every security program that scans the declared surface without reconciling it against the actual code and network topology is operating with a blind spot that autonomous agents can now enumerate in two hours for twenty dollars.

The most rigorous security posture isn't one that trusts developers to declare their own attack surface. It's one that reconstructs the actual surface independently — from documentation, from code, from network topology, from runtime behavior — and reconciles the differences. The gaps between those four pictures are where the breach already is.

The agent that found Lilli's gaps won't be the last to look. The next one may not be a researcher with a disclosure policy.

Techniques

sql injectionjson key injectionunauthenticated endpointsbehavioral poisoningautonomous offensive agentseparation of concerns failure