Six short reads on the math and intuition behind queues. No PhD required. When you're ready to play, head back to the simulator.
Even when your average arrival rate is manageable, random clustering means bursts of demand hit without warning. Five calm minutes, then eight people walk in at once. The math of randomness guarantees it.
At 70% utilization, things feel fine. At 85%, small bursts create queues that take minutes to clear. At 95%, the system never catches up. The relationship between utilization and wait time is exponential, not linear.
Formulas give you steady-state averages. Simulation gives you the full picture: worst-case waits, queue buildup during rush hours, and exactly how many servers keep your line short without overspending on idle capacity.
A highway at 99% capacity jams. At 80%, it flows fine.
Your coffee shop has 2 baristas, each able to serve 20 customers per hour. Capacity: 40/hr. So 38 customers/hour should be easy, right? That is 95% utilization.
But customers do not arrive evenly spaced. Random clustering means both baristas are occasionally busy at the same time, creating a queue. At 95% utilization, that queue rarely has time to empty before the next burst. The wait time curve goes vertical.
Drop to 80% utilization -- 32 customers/hour -- and the queue clears between bursts. The system breathes. That gap between 80% and 95% is where operational decisions live.
The name looks intimidating. It is just three facts about how your system works.
The first M stands for "Markovian." That just means memoryless -- customers show up randomly, with no pattern connecting one arrival to the next. A Poisson process. It is how most real-world arrivals actually work.
The second M means service times are also random -- exponentially distributed. Some customers are fast, some slow. This is why "we serve 20 per hour, and 20 arrive per hour" does not mean zero wait. Variability creates queues even when averages match.
The lowercase c is your server count -- baristas, agents, checkout lanes. Adding servers has diminishing returns: going from 1 to 2 might cut wait by 80%. Going from 10 to 11? Maybe 3%. The curve flattens fast.
One equation that holds for any stable queuing system, regardless of arrival distribution, service distribution, or number of servers.
If you know any two, you can derive the third. It works for coffee shops, emergency rooms, and packet networks alike.
How hard you're pushing the system — the ratio of work arriving to work the servers can finish.
In steady state, ρ also equals the fraction of time a server is busy — the utilization you see in results. ρ is the dial you set; utilization is what you measure. They coincide numerically, but they're different sides of the same coin.
Erlang-C gets you a number. Simulation gets you the truth.
How many agents do you need per shift to keep hold times under 2 minutes?
Model ticket arrivals and technician capacity to predict resolution backlogs.
Find the right number of cashiers for peak hours without overstaffing off-peak.
Balance customer wait times against staffing costs across branch hours.
DMV, permit offices, passport agencies -- optimize counter staffing for citizen satisfaction.
Restaurants, cafeterias, drive-throughs -- model customer flow through ordering and pickup.
The M/M/c model assumes first-come, first-served. Real systems are more complex.
The standard. Customers served in order of arrival.
Customers with higher urgency jump ahead.
Most recent arrival served first. Rare in customer service, common in stack-based systems.
Customer with shortest expected service time goes next. Minimizes average wait.
Each server gets customers in rotation regardless of availability.
Customers leave the queue (renege) or refuse to join (balk) when it is too long.
Head back to the simulator and try the scenarios. Adjust shifts. Watch the animation. See the math come to life.
Open the simulator