◆ Injection & InputDesign FlawEXP-018

The Instructed Hallucination

When the model couldn't tell the user's data from the user's instructions

2020s–Now · Python / TypeScript · 7 min read

Pattern Classification

Class

⧫ Boundary Collapse

Sub-pattern

Instruction Confusion

Invariant

When data crosses into a system that interprets structure, without being constrained or transformed, it becomes executable.

This Instance

Data within a prompt is interpreted as instructions by a language model

Detection Heuristic

If user input appears inside a query, command, template, object stream, or prompt without an intermediate representation that separates data from structure — you are not passing data. You are modifying structure.

Same Pattern Class

The Concatenated Query The Embedded Script The Unquoted CSV The Overflowed Return The Trusting Deserializer

Why It Persists

Every new execution context recreates this pattern. SQL gave way to NoSQL, HTML to templates, cookies to JWTs, forms to APIs, prompts to agents. The language changes. The failure does not.

Pattern Connections

→ AI Bridge

The Embedded Script

XSS injects script into HTML rendering. Prompt injection injects instructions into LLM reasoning. Same boundary collapse, new execution context

→ AI Bridge

The Concatenated Query

SQL injection concatenates data into queries. Prompt injection concatenates data into instructions. The language changed. The failure did not

→ AI Bridge

The Trusting Deserializer

Deserialization trusts object structure. LLMs trust prompt structure. Both execute attacker-controlled instructions disguised as data

Year

2022–present

Context

Large language models became application backends. Developers wrapped GPT, Claude, and Gemini in product interfaces — customer support bots, document summarizers, code assistants, search engines. The application constructed a prompt: system instructions first, then user input. The model received a single text stream and couldn't distinguish between the developer's instructions and the user's input. The user discovered they could override the system prompt — just as SQL injection overrode the query structure 25 years earlier.

Who Built This

AI application developers integrating LLMs via APIs. The model providers offered system prompts as a way to control behavior. The application developers relied on system prompts as a security boundary — and they weren't one.

Threat Model at Time

Hallucination — the model generating incorrect information. Bias in outputs. Cost optimization. Nobody initially modeled the prompt itself as an injection surface because prompts felt like configuration, not code.

Why It Made Sense

System prompts appeared to be authoritative. "You are a helpful customer support agent. Never discuss competitors. Only answer questions about our products." The model followed these instructions — until a user said "Ignore your previous instructions." The boundary between instruction and data was typographic, not structural.

Archaeologist's Note

This pattern has been found in applications built by talented developers at respected organizations across every decade of software history. Its presence in a codebase is not a reflection of the developer who wrote it — it is a reflection of what that developer was taught, what tools they had, and the path that was easiest given what they were taught. The goal is not to find fault. The goal is to find the pattern — before it finds you.

Katie's Law: The developers were not wrong. The shortcut was not wrong. The context changed and the shortcut didn't.

▧ The Frontier▸The AI Pavilion1 / 2

Museum Map Next Exhibit→