Museum Wire
Law 0 · Katie's LawEvery system is shaped by the human drive to do less work. This is not a flaw. It is the economic force that produces all software — and all software failure.Law I · Boundary CollapseWhen data crosses into a system that interprets structure, without being constrained, it becomes executable.2026 IncidentAxios. 70 Million Downloads a Week. North Korea Inside.Law II · Ambient AuthorityWhen a system trusts the presence of a credential instead of verifying the intent behind it, authentication becomes indistinguishable from authorization.AXM-001Set Theory — Membership, Boundaries, and BelongingLaw III · Transitive TrustWhen a system inherits trust from a source it did not verify, the attack surface extends to everything that source touches.2026 IncidentClaude Code — The Accept-Data-Loss FlagLaw IV · Complexity AccretionSystems do not become complex. They accumulate complexity — one reasonable decision at a time — until no single person can hold the whole in their head.Law V · Temporal CouplingCode that assumes sequential execution, stable state, or consistent timing will fail the moment concurrency, scale, or latency proves the assumption wrong.2026 IncidentCopy Fail — 732 Bytes to Root on Every Linux DistributionAXM-002Boolean & Propositional Logic — True, False, and the Excluded MiddleLaw VI · Observer InterferenceWhen the system that monitors health becomes a participant in the system it monitors, observation becomes a failure vector.2025Replit Agent — The Vibe Code Wipe2024Air Canada Chatbot — The Policy That Wasn't2024Change Healthcare — One-Third of US Healthcare, One Missing MFA2024CrowdStrike — The Security Update That Broke the World2024Google Gemini Image Generation — The Six-Day Pause2024XZ Utils — The Two-Year Infiltration20233CX — The Supply Chain That Ate Another Supply Chain2023Amazon Prime Video — The Per-Frame State Machine2023Bing Sydney — The Chatbot That Went Rogue2023Samsung ChatGPT Leak — The Employee Who Pasted the Secret2022Meta Galactica — The Three-Day Scientific Oracle2021Colonial Pipeline — When Billing Shut Down the Fuel2021Facebook — The Six Hours That VanishedEFFODE · LEGE · INTELLEGELaw 0 · Katie's LawEvery system is shaped by the human drive to do less work. This is not a flaw. It is the economic force that produces all software — and all software failure.Law I · Boundary CollapseWhen data crosses into a system that interprets structure, without being constrained, it becomes executable.2026 IncidentAxios. 70 Million Downloads a Week. North Korea Inside.Law II · Ambient AuthorityWhen a system trusts the presence of a credential instead of verifying the intent behind it, authentication becomes indistinguishable from authorization.AXM-001Set Theory — Membership, Boundaries, and BelongingLaw III · Transitive TrustWhen a system inherits trust from a source it did not verify, the attack surface extends to everything that source touches.2026 IncidentClaude Code — The Accept-Data-Loss FlagLaw IV · Complexity AccretionSystems do not become complex. They accumulate complexity — one reasonable decision at a time — until no single person can hold the whole in their head.Law V · Temporal CouplingCode that assumes sequential execution, stable state, or consistent timing will fail the moment concurrency, scale, or latency proves the assumption wrong.2026 IncidentCopy Fail — 732 Bytes to Root on Every Linux DistributionAXM-002Boolean & Propositional Logic — True, False, and the Excluded MiddleLaw VI · Observer InterferenceWhen the system that monitors health becomes a participant in the system it monitors, observation becomes a failure vector.2025Replit Agent — The Vibe Code Wipe2024Air Canada Chatbot — The Policy That Wasn't2024Change Healthcare — One-Third of US Healthcare, One Missing MFA2024CrowdStrike — The Security Update That Broke the World2024Google Gemini Image Generation — The Six-Day Pause2024XZ Utils — The Two-Year Infiltration20233CX — The Supply Chain That Ate Another Supply Chain2023Amazon Prime Video — The Per-Frame State Machine2023Bing Sydney — The Chatbot That Went Rogue2023Samsung ChatGPT Leak — The Employee Who Pasted the Secret2022Meta Galactica — The Three-Day Scientific Oracle2021Colonial Pipeline — When Billing Shut Down the Fuel2021Facebook — The Six Hours That VanishedEFFODE · LEGE · INTELLEGE
Keyboard Navigation
W
A
S
D
or arrow keys · M for map · Q to exit
← Back to Incident Room
2026softwarePublic

macOS TCP Freeze — The 49-Day Clock

Any macOS system running continuously for 49 days, 17 hours, 2 minutes, and 47 seconds silently loses the ability to establish new TCP connections. Existing connections remain alive. Ping works. The machine appears healthy. No error is logged. The symptom — "the internet stopped working" — is functionally indistinguishable from a network outage, a misconfigured firewall, or a bad DNS resolver. Most consumer Macs never hit the threshold because OS updates force reboots. The machines that do hit it — developer workstations, Mac Minis used as servers, CI runners, studio machines, any Mac treated as infrastructure — fail silently and are almost never correctly diagnosed.

6 min read
Root Cause

The XNU kernel's TCP subsystem maintains an internal clock called `tcp_now` — a 32-bit unsigned integer (`uint32_t`) that increments once per millisecond since boot. The value is used throughout the TCP stack to timestamp connection state, manage retransmit timers, and determine when connections in the TIME_WAIT state are safe to reap. A `uint32_t` can hold a maximum of 4,294,967,295. Divided by 1,000 (milliseconds per second), that's 4,294,967 seconds — 49 days, 17 hours, 2 minutes, and 47 seconds. At that precise moment of uptime, `tcp_now` reaches its ceiling. A monotonicity guard in the kernel is intended to handle wraparound, but it fails: instead of allowing the counter to wrap and continue, it freezes the clock permanently at its maximum value. With the timer frozen, the kernel's TIME_WAIT garbage collector can no longer determine that any connection is old enough to be reaped. TIME_WAIT connections — which normally persist for 30 seconds and are then discarded — accumulate indefinitely. The ephemeral port range (49152–65535, roughly 16,000 ports) fills with ghost connections. Once all ports are exhausted, no new outbound TCP connection can be established. The system continues to pass ICMP (ping). It continues to serve any established long-lived connection. It simply cannot open a new socket.

Aftermath

Discovered and documented in April 2026 by researchers at Photon (photon.codes), an AI agent connectivity company whose long-running macOS processes encountered the failure in production. The bug was confirmed to affect macOS from at least version 10.15 (Catalina) through current releases at time of discovery. Apple acknowledged the report. The only known mitigation is rebooting before the 49.7-day threshold — setting a periodic restart calendar event, a launchd timer, or a system policy. A kernel-level fix requires either widening `tcp_now` to a 64-bit integer (eliminating the overflow for practical purposes) or correctly implementing RFC 7323 timestamp wraparound handling so the monotonicity guard allows graceful rollover.

The Incident

In April 2026, engineers at Photon discovered that their macOS processes were silently losing network connectivity after extended uptime. The failure was repeatable, precise, and deeply unintuitive: at exactly 49 days, 17 hours, 2 minutes, and 47 seconds of continuous operation, a Mac stops being able to open new TCP connections.

Ping still works. Existing connections stay alive. The system reports itself healthy. Nothing in the system log points at the cause. The machine has simply, silently, run out of network sockets — because a 32-bit millisecond counter in the kernel has hit its ceiling and frozen.

How the Clock Works

The XNU kernel's TCP stack uses an internal variable called tcp_now to track time. It is defined as a uint32_t — a 32-bit unsigned integer — that increments once per millisecond since boot. Every TCP connection timestamp, every retransmit timer, every TIME_WAIT expiration calculation references this counter.

``c

// Simplified from XNU source — bsd/netinet/tcp_timer.c

uint32_t tcp_now; // milliseconds since boot

`

uint32_t has a maximum value of 4,294,967,295. At one increment per millisecond, it takes exactly 49 days, 17 hours, 2 minutes, and 47 seconds to reach that ceiling.

The Freeze

When tcp_now reaches its maximum, the kernel's monotonicity guard — code designed to handle timer wraparound — fails. Rather than allowing the counter to roll over from 0xFFFFFFFF back to 0x00000000 and continue, it freezes at the maximum value. The clock stops.

This is not a crash. There is no panic. The kernel continues running. Every subsystem that doesn't reference tcp_now is unaffected.

But the TCP garbage collector checks tcp_now to decide when connections in TIME_WAIT state are old enough to be reaped. TIME_WAIT is a mandatory 30-second holding pattern that TCP places a connection into after it closes — a safety buffer to prevent late-arriving packets from corrupting a new connection on the same port. Normally, the kernel checks the timestamp, finds it 30+ seconds old, and discards the entry.

With the clock frozen, the timestamp comparison always returns "not yet." Every TIME_WAIT connection becomes permanent. They accumulate.

The Cascade

macOS allocates ephemeral ports — the source ports for outbound connections — from the range 49152–65535. That's approximately 16,000 available ports. Under normal operation, TIME_WAIT connections drain within 30 seconds. Under the frozen clock, they drain never.

As the ghost connections accumulate, the ephemeral port pool shrinks. When it hits zero, connect() fails with EADDRINUSE. Every new TCP connection attempt — browser request, API call, database query, git push — returns an error.

The machine has not crashed. The network interface is up. DNS resolves. ping succeeds. From the outside, nothing is wrong. From the inside, no new socket can open.

Why It's Hard to Diagnose

The failure has no obvious cause visible to the user or most diagnostic tools:

- ping 8.8.8.8 — succeeds (ICMP, not TCP)

- Browser shows "ERR_CONNECTION_REFUSED" or "Network Error"

- curl hangs or fails with "Can't assign requested address"

- netstat -an | grep TIME_WAIT | wc -l — returns thousands

- System uptime — 49+ days

The last two commands reveal the actual state, but most users and engineers don't reach them. The diagnosis path runs through "bad WiFi" → "reboot the router" → "call support" → accidental discovery when the reboot fixes it and the true cause is never identified.

The Pattern It Belongs To

This is a timer rollover failure — a member of a recurring class of bugs where a finite integer is used to track something that grows without bound. The pattern has a long history:

- Y2K (2000): two-digit year fields that couldn't represent the year 2000

- Unix Timestamp 2038: signed 32-bit seconds-since-epoch overflows on January 19, 2038

- GPS Week Rollover (1999, 2019): 10-bit week counter rolls over every 19.7 years

- Boeing 787 power counter (2015): 32-bit integer tracking 100ms intervals overflows at 248 days, recommended solution was to reboot every 120 days

- Windows TCP (historical): similar TIME_WAIT accumulation bugs in early NT-era networking stacks

The shared DNA is always the same: an engineer chose a data type based on what seemed like more than enough range for the expected use case. The system outlived the assumption.

The macOS variant is particularly instructive because the failure is mediated through resource exhaustion rather than direct corruption. The counter doesn't crash anything when it overflows — it freezes, and that freezing causes a completely different subsystem (port allocation) to fail. The indirection makes it almost impossible to diagnose without knowing to look for it.

Why Macs Specifically

macOS has long had a reputation for reliability that leads users and administrators to treat Macs as infrastructure — as machines that don't need reboots. CI runners, build servers, studio workstations, home servers, Mac Minis running as always-on nodes: all of these comfortably exceed 49-day uptimes. The machines that are most likely to be treated as reliable long-running infrastructure are exactly the machines most likely to hit the threshold.

The irony is structural. The reputation for reliability creates the conditions for this specific class of failure.

The Fix

A correct solution requires one of:

1. Widen tcp_now to uint64_t — a 64-bit millisecond counter overflows in approximately 584 million years, which is sufficient

2. Implement RFC 7323 wraparound handling — the TCP spec anticipates timestamp rollover; the monotonicity guard needs to correctly handle the 0xFFFFFFFF → 0x00000000 transition

Until a kernel patch ships, the practical mitigation is a scheduled reboot. A monthly restart (every 28–30 days) stays well below the threshold. macOS launchd can automate this. The irony of needing to schedule reboots on the platform most celebrated for not needing them is not lost.

What Developers Should Watch For

Any codebase that uses a fixed-width integer to track elapsed time should ask:

- What is the maximum value of this type?

- What happens at or after that value is reached?

- What is the realistic maximum uptime or run duration of the system using this counter?

- Is the answer to question 3 plausibly larger than the answer to question 1?

For a millisecond counter: uint32_t gives you 49.7 days. uint64_t` gives you 584 million years. The cost of the wider type is four additional bytes per instance. The cost of choosing wrong is every TCP connection on a production machine silently failing after seven weeks.

Techniques
integer overflowtimer rolloverresource exhaustionsilent failure