Understanding Tail Latency in Distributed Systems

January 15, 2026

When we talk about service performance, we often focus on average response time. But in distributed systems, the tail of the latency distribution — the p99 and p999 — often matters much more than the median.

Consider a service that makes 50 parallel backend calls to serve a single user request. Even if each backend call has a p99 latency of 100ms, the probability that at least one of those 50 calls hits the p99 is 1 - (0.99)^50 ≈ 39%. That means roughly 4 out of 10 user requests will experience tail latency.

This is why Jeff Dean's famous "Tail at Scale" paper recommends techniques like hedged requests and tied requests — you can't just optimize the average.

In practice, I've found that the most common causes of tail latency in production are garbage collection pauses, lock contention, and network retransmissions. Each requires a different mitigation strategy.