What Heroku’s Puma Keepalive Investigation Teaches Us About Real Infrastructure Work

Heroku’s deep dive into the weird behavior between Router 2.0 and Puma is one of those posts you want to bookmark and carry into every performance triage conversation. If you’ve ever seen a Rails app suddenly spike in tail response times after flipping a platform flag and thought “why is this slower with what should be better,” this is the exact kind of investigation that cuts through the noise. The short version: what looked like a router issue turned out to be a keepalive behavior in Puma interacting badly with Router 2.0 under specific conditions. The longer version shows why digging matters.

The Setup and the Surprise

Router 2.0 launched with HTTP2 support and a bunch of improvements, and in most cases it was faster and more stable. But a small subset of Puma based Rails apps started showing brutal long tail latency when running behind Router 2.0, especially when the app itself had even mild internal latency injected. The symptom was obvious: median response times stayed fine, but the 95th and 99th percentiles ballooned and throughput cratered. An app handling 30 requests per second on the legacy router would start timing out and falling over with Router 2.0 enabled. Heroku’s engineers chased it hard, trying to reproduce it, only to hit a wall until they added artificial server side delay. That delay exposed the bizarre behavior and got the ball rolling.

The Investigation

The team went in deep. They compared the connection patterns between legacy router and Router 2.0 and noticed that Router 2.0 was keeping connections alive by default while the old router closed them after each request. Normally keepalives are a win because you avoid TCP setup tear down overhead. But Puma’s internal request handling logic had a subtle interaction with keepalives that under certain load shapes created effectively serialized request stalls.

Puma has a concept of processing multiple requests off a single connection inline up to a threshold (max_fast_inline, default 10). That combined with the way keepalive clients were handed back to the reactor could result in situations where requests sat waiting behind others for a full second or more. The math was maddeningly clear: 10 requests at 200ms each can add up to two seconds of apparent tail latency even though the system was technically still processing things in order. Race conditions and timing of internal flags made the behavior nondeterministic and confusing unless you understood the Puma code path.

The Solution (For Now)

After digging into the Puma server code, the team concluded that the cleanest short term fix was to disable keepalives between the router and the dyno. That bypasses the problematic inline request behavior by forcing Puma to hand back client connections immediately instead of trying to batch them. Heroku worked with the Puma maintainers and opened a pull request and issue to add an enable_keep_alives option to the Puma DSL. That toggle when set to false explicitly disables keepalives and immediately restored sane response times in their tests.

If you are running into the same issue today on Router 2.0, there is also a platform level stopgap: you can enable a labs flag to disable keepalives from the router side with:

heroku labs:enable http-disable-keepalive-to-dyno -a your-app-name

That has no effect on the legacy router because it did not support keepalives in the first place. It is a temporary mitigation while the upstream fix in Puma lands in a future release.

What You Can Do Right Now

If you have a Rails app on Heroku using Puma and you are seeing inconsistent tail latency or timeouts under moderate load with Router 2.0 enabled, here’s the quick checklist:

Try disabling keepalives via the labs flag to confirm the behavior is the same as Heroku saw.
Upgrade to the Puma version that exposes the new enable_keep_alives configuration once it is released, and set it to false if you need immediate stability.
Tune Puma itself: more threads, more workers, or decreasing max_fast_inline can help in some cases where disabling keepalives is not desirable.
Test with controlled internal latency to surface long tail behavior early instead of discovering it in production. Inject synthetic delay in staging to see how your setup behaves.

The Bigger Picture

What this whole story highlights is how performance problems are rarely one dimensional. A new router that should improve things exposed a deep interaction between keepalive connection reuse and Puma internals. If you’re treating infrastructure as a black box, that kind of issue becomes a production nightmare. The way Heroku documented the reproduction, the code reading, and the measured experiments is exactly how you want to build institutional knowledge and system resilience.

If You Need Help

All of this is great if you enjoy plumbing into C code, Ruby internals, and TCP state charts. If you want someone who already has the experience and context to move fast without guessing, reach out to Rails Agency. Whether it’s tuning Puma for tail latency, safely rolling out Router 2.0, or architecting a resilient Ruby deployment, having a team that knows where to look and what to change saves time and avoids those late nights.

Wrap Up

Heroku’s write up on the Puma keepalive issue is more than a postmortem. It’s a reminder that performance is an emergent property of many moving parts, and that understanding the shape of those parts and how they interact matters. The current workaround is solid for stability, and the upstream fix will let keepalives behave the way they should in the future. In the meantime, if you are running Rails on Heroku and you want someone to vet your setup or help you execute the appropriate configuration safely, Rails Agency is a practical call to make.

Happy tuning. 🚀