Plain-English queueing theory

Why lines form, why averages lie, and what to do about it.

Six short reads on the math and intuition behind queues. No PhD required. When you're ready to play, head back to the simulator.

01 / The Problem

Not all customers arrive on a schedule

Even when your average arrival rate is manageable, random clustering means bursts of demand hit without warning. Five calm minutes, then eight people walk in at once. The math of randomness guarantees it.

02 / The Insight

Near-full capacity is dangerous

At 70% utilization, things feel fine. At 85%, small bursts create queues that take minutes to clear. At 95%, the system never catches up. The relationship between utilization and wait time is exponential, not linear.

03 / The Answer

Simulation finds the sweet spot

Formulas give you steady-state averages. Simulation gives you the full picture: worst-case waits, queue buildup during rush hours, and exactly how many servers keep your line short without overspending on idle capacity.

A highway at 99% capacity jams. At 80%, it flows fine.

Your coffee shop has 2 baristas, each able to serve 20 customers per hour. Capacity: 40/hr. So 38 customers/hour should be easy, right? That is 95% utilization.

But customers do not arrive evenly spaced. Random clustering means both baristas are occasionally busy at the same time, creating a queue. At 95% utilization, that queue rarely has time to empty before the next burst. The wait time curve goes vertical.

Drop to 80% utilization -- 32 customers/hour -- and the queue clears between bursts. The system breathes. That gap between 80% and 95% is where operational decisions live.

Coffee shop simulation
2 baristas, 70% utilization -- smooth
Customers flow through with minimal wait. Baristas have breathing room between orders.
2 baristas, 90% utilization -- building up
Queue forms during clusters. Wait times creep up. Some customers glance at their watches.
2 baristas, 98% utilization -- out of control
Queue never clears. Wait times spiral. Customers walk out. Revenue lost every minute.
3 baristas, 65% utilization -- back to smooth
One additional server. The queue vanishes. Customers are happy. That is the power of the math.
Under 75%: flowing
75-90%: caution
Over 90%: danger

What M/M/c actually means

The name looks intimidating. It is just three facts about how your system works.

M

Random arrivals

The first M stands for "Markovian." That just means memoryless -- customers show up randomly, with no pattern connecting one arrival to the next. A Poisson process. It is how most real-world arrivals actually work.

"Markovian just means memoryless. Random."
M

Variable service times

The second M means service times are also random -- exponentially distributed. Some customers are fast, some slow. This is why "we serve 20 per hour, and 20 arrive per hour" does not mean zero wait. Variability creates queues even when averages match.

"Why 'capacity = demand' is dangerously wrong."
Servers

Multiple parallel servers

The lowercase c is your server count -- baristas, agents, checkout lanes. Adding servers has diminishing returns: going from 1 to 2 might cut wait by 80%. Going from 10 to 11? Maybe 3%. The curve flattens fast.

"1 to 2 cuts wait by 80%. 10 to 11? Maybe 3%."

Little's Law

One equation that holds for any stable queuing system, regardless of arrival distribution, service distribution, or number of servers.

L = λ · W
  • In System Average number of people in the system
  • Arrivals/hr Arrival rate (customers per unit time)
  • Time in System Average time each customer spends in the system

If you know any two, you can derive the third. It works for coffee shops, emergency rooms, and packet networks alike.

Arrivals/hr = 60 customers / hour
Time in System = 5 min = 0.083 hr
In System = 60 × 0.083 = 5 people in the system at any time
If you want L under 3, you need W under 3 min -- or fewer arrivals.

Traffic Intensity

How hard you're pushing the system — the ratio of work arriving to work the servers can finish.

ρ = λ / μ
ρ = λ / (c · μ)  for M/M/c
  • ρ Traffic intensity — offered load per server (0 to 1)
  • λ Arrival rate (customers per unit time)
  • μ Service rate per server (customers per unit time)
  • c Number of parallel servers

In steady state, ρ also equals the fraction of time a server is busy — the utilization you see in results. ρ is the dial you set; utilization is what you measure. They coincide numerically, but they're different sides of the same coin.

λ = 40 customers / hour
μ = 25 customers / hour per barista
c = 2 baristas
ρ = 40 / (2 × 25) = 0.80
80% loaded. The queue clears between bursts. Push it to 0.95 and the wait curve goes vertical.

Why simulate what you can calculate?

Erlang-C gets you a number. Simulation gets you the truth.

Erlang-C Formula Analytical

  • Instant calculation, no waiting
  • Steady-state averages for M/M/c systems
  • Good for quick back-of-envelope checks
  • Breaks with rush hours or time-varying demand
  • Exponential service times only
  • Only gives averages, not variability
  • Cannot model shifts, breaks, or priority queues

QueueSim DES Simulation

  • Models every individual customer through the system
  • Rush hours, shift changes, variable demand
  • Any service time distribution (Normal, Constant, Exponential)
  • Shows variability -- not just averages
  • Up to 168 simulated hours per run
  • Takes a moment to compute (server-side DES engine)
"The formula gets you through the exam. The simulation gets you through the real decision."

Works for Any Queuing System

Call Centers

How many agents do you need per shift to keep hold times under 2 minutes?

IT Help Desks

Model ticket arrivals and technician capacity to predict resolution backlogs.

Retail Checkout

Find the right number of cashiers for peak hours without overstaffing off-peak.

Bank Tellers

Balance customer wait times against staffing costs across branch hours.

Government Offices

DMV, permit offices, passport agencies -- optimize counter staffing for citizen satisfaction.

Food Service

Restaurants, cafeterias, drive-throughs -- model customer flow through ordering and pickup.

Not all queues are created equal.

The M/M/c model assumes first-come, first-served. Real systems are more complex.

FIFO (First In, First Out)

The standard. Customers served in order of arrival.

Used in: checkout lines, call centers, most queues.

Priority / Acuity

Customers with higher urgency jump ahead.

Used in: emergency departments (triage), airline boarding, VIP support lines.

LIFO (Last In, First Out)

Most recent arrival served first. Rare in customer service, common in stack-based systems.

Used in: warehouse picking, undo operations.

Shortest Job First

Customer with shortest expected service time goes next. Minimizes average wait.

Used in: some scheduling systems, CPU task scheduling.

Round Robin

Each server gets customers in rotation regardless of availability.

Used in: load balancers, help desk ticket assignment.

Reneging & Balking

Customers leave the queue (renege) or refuse to join (balk) when it is too long.

Used in: modeling real-world behavior, call center abandonment.
QueueSim currently models FIFO queues. Priority, reneging, and other disciplines are available in the underlying DES engine.

Ready to play with it?

Head back to the simulator and try the scenarios. Adjust shifts. Watch the animation. See the math come to life.

Open the simulator