Video quality monitoring service processed every frame through individual AWS Step Function state transitions, designed for orchestration not high-frequency data processing
Team moved to monolith, reduced costs 90%. Published blog post. ThePrimeagen's reaction video went viral, highlighting the irony of AWS not understanding their own products.
The Incident
Amazon Prime Video's audio/video quality monitoring service was built on AWS Step Functions and Lambda. The service checked every video stream for quality defects — dropped frames, corruption, block artifacts.
The architecture processed every frame of every stream through individual Step Function state transitions. Step Functions charge per state transition. At video scale — 24-30 frames per second per stream — this meant millions of state transitions per stream.
The Architecture
``
Video stream → Step Function → Lambda (per frame) → S3 → Lambda → SNS
``
Each frame triggered a state machine transition. Each transition cost money. Each Lambda invocation had cold start potential. The architecture was designed for orchestration workflows (approve this order, route this ticket) not high-frequency data processing.
The "Fix"
The team collapsed the distributed architecture into a single monolith process. Same logic. Same quality checks. 90% cost reduction.
They published this as a blog post titled "Scaling up the Prime Video audio/video monitoring service and reducing costs by 90%." The framing was: we discovered monoliths can be better than microservices for some workloads.
The Reaction
ThePrimeagen's response captured what the blog post didn't say: this wasn't a discovery about microservices vs monoliths. This was Amazon — the company that built and sells AWS — not understanding which of their own products was appropriate for this workload. Step Functions are for state machines with infrequent transitions, not per-frame video processing.
Why It Matters
The "microservices for everything" best practice of 2015 was the design assumption that created this disaster. The architecture made sense on a whiteboard. It made sense in a design review. It didn't make sense when applied to a data flow that generates millions of events per second. Right-size your architecture to your data flow.