Keyboard Navigation
W
A
S
D
or arrow keys · M for map · Q to exit
← Back to exhibits
Design AssumptionsRequirement FlawEXP-008

The Runaway Migration

It worked in dev. It locked prod for four hours.

2000s · SQL / Rails · 6 min read
Pattern Classification
Class
Temporal Coupling
Sub-pattern
Scale Blindness
Invariant

Code that assumes sequential execution, stable state, or consistent timing will fail the moment concurrency, scale, or latency proves the assumption wrong.

This Instance

Operations that work at small scale hit non-linear thresholds at production volume

Detection Heuristic

If a system checks a condition and then acts on it without holding a lock or using an atomic operation — if code that works on small data fails on large data — if behavior changes under load — the system is temporally coupled to assumptions about sequencing, scale, or speed.

Same Pattern Class
Why It Persists

Every system that operates across time — concurrent threads, distributed nodes, growing datasets, eventual consistency — contains temporal assumptions. The more distributed the system, the more assumptions it makes about time, and the more ways those assumptions can fail.

Pattern Connections
Cross-Domain Analog
The Unsynchronized Handshake

Both are temporal coupling failures — one at the microsecond scale (threads), one at the data-growth scale (migrations)

Cross-Domain Analog
The Hardwired Year

Both are temporal coupling failures — one assumes the century won't change, the other assumes the dataset won't grow. Same pattern: the code is correct for the present, fatal for the future

Enables
The Ouroboros Health Check

A runaway migration under load triggers health check failures, creating a cascading feedback loop

Year

2005–2018

Context

Ruby on Rails popularized database migrations — versioned schema changes checked into source control. The pattern spread to every framework: Django, Laravel, Flyway, Alembic, Entity Framework. Developers wrote migrations on laptops with 10,000 rows. They tested on staging servers with 50,000 rows. Then they ran them in production against 80 million rows. The migration that took 200 milliseconds in dev ran for four hours in production, holding a table lock the entire time.

Who Built This

Application developers writing schema changes. They understood the application perfectly. They understood databases enough to write ALTER TABLE. They didn't understand lock escalation, online DDL, or how UPDATE on 80 million rows differs from UPDATE on 10,000 rows.

Threat Model at Time

Data loss. Would the migration corrupt data? Would the rollback work? Nobody modeled migration runtime as a risk because it was always fast — in every environment they tested.

Why It Made Sense

Migration frameworks made schema changes reproducible and reversible. rails db:migrate was vastly superior to emailing DDL scripts. The frameworks abstracted away the SQL, which made migrations accessible. They also abstracted away the performance characteristics, which made migrations dangerous.

Archaeologist's Note

This pattern has been found in applications built by talented developers at respected organizations across every decade of software history. Its presence in a codebase is not a reflection of the developer who wrote it — it is a reflection of what that developer was taught, what tools they had, and the path that was easiest given what they were taught. The goal is not to find fault. The goal is to find the pattern — before it finds you.

Katie's Law: The developers were not wrong. The shortcut was not wrong. The context changed and the shortcut didn't.

The Cloud HallThe Architecture Lab2 / 4
Previous ExhibitMuseum MapNext Exhibit