Keyboard Navigation
W
A
S
D
or arrow keys · M for map · Q to exit
← Back to Incident Room
2022ai failurePublic

Meta Galactica — The Three-Day Scientific Oracle

Pulled from public access after 72 hours. Generated fabricated scientific papers, fake citations, and authoritative-sounding misinformation formatted as peer-reviewed research. Demonstrated the Confident Confabulator failure class at maximum visibility.

2 min read
Root Cause

The model learned to reproduce the format of scientific writing — citations, abstracts, methodology sections, authoritative tone — without grounding its outputs in factual accuracy. It optimized for plausibility of form rather than correctness of content.

Aftermath

Became a defining example of LLM hallucination risk. Accelerated industry discussion of grounding requirements, retrieval augmentation, and the difference between language modeling and knowledge retrieval. Released three days before ChatGPT.

The Incident

On November 15, 2022, Meta AI published Galactica — a 120-billion parameter large language model trained on 48 million scientific papers, textbooks, lecture notes, and reference materials. The demo was impressive: it generated Wikipedia-style summaries, answered factual questions with citations, annotated chemical structures, and produced formatted academic prose.

On November 18, 2022 — 72 hours later — Meta pulled the public demo. The model was generating fabricated scientific content indistinguishable in format from real research.

What It Generated

- A detailed paper on "The Benefits of Eating Crushed Glass," formatted with sections, methodology, and fake citations to real journals

- A confident Wikipedia-style article on the history of bears in space, complete with dates, astronaut names, and mission details — none of which existed

- Chemistry explanations that mixed real chemical relationships with invented ones, in the same confident register

- Papers citing real authors on topics those authors had never written about

The failure was not the content alone. The failure was the confidence. The model's outputs were formatted identically whether they were accurate or fabricated. There was no signal that distinguished "this is something the model knows" from "this is something the model generated to fill the expected format."

The Pattern

This is [The Confident Confabulator](/exhibits/the-confident-confabulator) (EXP-010): a system that learns the surface form of reliable knowledge — citation style, academic register, technical formatting — without the underlying verification mechanism that makes that form meaningful.

The pattern predates AI: the legal brief written by a lawyer who knew how briefs look but not what the cited cases said. The expert witness who knew the language of expertise but not its substance. The difference with LLMs is scale and speed: a model can produce a thousand plausible-sounding fabrications in the time it takes a human expert to produce one.

Why It Matters

Galactica launched three days before ChatGPT. If ChatGPT had not launched when it did, Galactica's failure might have set back public deployment of LLMs by years. Instead, ChatGPT launched into a news cycle already discussing LLM hallucination — which meant the risk was framed from the beginning, not discovered after mass adoption.

The format of truth is not the same as truth. Galactica learned the former without the latter. So did every model that followed.
Techniques
confident confabulationformat overfittinghallucination at scale