Queuing theory basics part 2: multiple servers
These are my notes for the following three lectures on queueing theory:
Multiple servers, infinite queue (M/M/c)
Consider the case with c servers and infinite queue capacity.
Assume people form a common line and are served by the first available server.
Additionally, assume all servers have identical service rate μ.
The effective service rate depends on the number of people in the system.
If there are n≤c people are in the system,
then the service rate is nμ.
If there are n≥c people in the system,
then all servers are busy, so the service rate is cμ.
Distribution of queue lengths
We can now compute a recurrence similar to the single server case:
pn(t+h)⟹hpn(t+h)−pn(t)=P(1 arrival, 0 service)⋅pn−1(t)+P(0 arrival, 1 service)⋅pn+1(t)+P(0 arrival, 0 service)⋅pn(t)=λh⋅(1−μn−1h)⋅pn−1(t)+(1−λh)⋅μn+1h⋅pn+1(t)+(1−λh)⋅(1−μnh)⋅pn(t)≈λhpn−1(t)+μn+1hpn+1(t)+(1−λh−μnh)pn(t)=λpn−1(t)+μn+1pn+1(t)−(λ+μn)pn(t).
At steady-state, the distribution is time-independent, so
hpn(t+h)−pn(t)=0.
Thus
λpnn+1+μn+1pn+1=(λ+μn)pn.
Similarly, we find μ1p1=λp0 (or p1=μ1λp0).
Now
λp0+μ2p2=(λ+μ1)p1=λp1+μ1p1=λp1+λp0⟹μ2p2=λp1⟹p2=μ2λp1.
The recurrence continues similarly:
pn=∏i=1nμiλnp0={n!ρnp0c!cn−cρnp0 if n≤c if n≥c.
Now we can compute p0:
1=n≥0∑pn=n=0∑c−1n!ρnp0+n≥c∑c!cn−cρnp0=p0(n=0∑c−1n!ρn+c!ρc1−ρ/c1)⟹p0=(n=0∑c−1n!ρn+c!ρc1−ρ/c1)−1.
Expected queue length and wait time
There are only people waiting if there are n≥c people in the system,
so the expected number of people waiting is
Lq=n≥c∑(n−c)pn=n≥0∑npn+c=n≥1∑nc!cnρn+cp0=c!cp0ρc+1n≥1∑ncn−1ρn−1=c!cp0ρc+1d(ρ/c)dn≥1∑cnρn=c!cp0ρc+1d(ρ/c)d(1−ρ/c1−1)=c!cp0ρc+1((1−ρ/c)21)=(c−1)!(c−ρ)2p0ρc+1.
A similar calculation can be carried out for
the expected number of people in the system, Ls=Lq+ρ.
With these, can can again calculate the expected wait time and service time
using Little's law:
Lq=λWq,Ls=λWs.
In the case of multiple servers,
the calculation of utilization is more complicated.
It can be thought of as the average probability that any one server is busy,
assuming people are uniformly distributed across servers.
That is,
U=n=0∑c−1cnpn+n≥c∑pn=cρ.
As before, we can plot the latency with respect to utilization:
import matplotlib.pyplot as plt
import numpy as np
from scipy.special import factorial
def plot_latency(c, **kwargs):
rho = np.linspace(0, c, 1000)
p0 = 1/(sum(rho**n/factorial(n) for n in range(0, c)) +
rho**c/factorial(c)/(1-rho/c))
utilization = rho/c
latency = p0*rho**c/(factorial(c-1)*(c-rho)**2) + 1
plt.plot(utilization, latency, label=f'{c} servers', **kwargs)
plt.figure(figsize=(9, 6))
plt.ylim(0, 10)
plt.xlabel('utilization')
plt.ylabel('latency')
for c, alpha in [(2, 0.25), (8, 0.5), (32, 0.75), (128, 1.0)]:
plot_latency(c, alpha=alpha, color='blue')
plt.legend()
plt.savefig('mmc-latency.png')
plt.show()
We can also plot the latency with respect to the arrival rate,
holding the service rate fixed.
So long as λ≤cμ,
the arrival rate is equal to the throughput of the system
(if λ>cμ, the queue will grow indefinitely).
This demonstrates that adding more servers reduces the latency:
import matplotlib.pyplot as plt
import numpy as np
from scipy.special import factorial
rho = np.linspace(0, 5, 100)
def plot_latency(c, **kwargs):
p0 = 1/(sum(rho**n/factorial(n) for n in range(0, c)) +
rho**c/factorial(c)/(1-rho/c))
latency = p0*rho**c/(factorial(c-1)*(c-rho)**2) + 1
plt.plot(rho, latency, label=f'{c} servers', **kwargs)
plt.figure(figsize=(9, 6))
plt.ylim(0, 10)
plt.xlabel('throughput')
plt.ylabel('latency')
plot_latency(5)
plot_latency(6)
plt.legend()
plt.savefig('mmc-throughput.png')
plt.show()
Multiple servers, finite queue (M/M/c/N)
Consider a finite queue of capacity N feeding into c servers,
where people arrive at rate λ and each server has rate μ.
In the following, assume N>c.
Distribution of queue lengths
The recurrence for the distribution over the number of people in the system
is the same as the M/M/1/N queue:
pn={n!ρnp0c!cn−cρnp0 if n≤c if n≥c.
We can now compute the probability that there are 0 people in the system:
1=n=0∑Npn⟹p0=n=0∑c−1n!ρn+n=c∑Nc!cn−cρnp0=p0(n=0∑c−1n!ρn+c!ρc⋅1−ρ/c1−(ρ/c)N−c+1)=(n=0∑c−1n!ρn+c!ρc⋅1−ρ/c1−(ρ/c)N−c+1)−1.
Expected queue length and wait times
We can compute the expected queue length similar to the M/M/c case:
Lq=n=c∑N(n−c)pn=n=0∑N−cnpn+c=n=0∑N−cnc!cnρn+cp0=(c−1)!ρc+1p0⋅(c−ρ)21−(ρ/c)N−c+1−(N−c+1)(1−ρ/c)(ρ/c)N−c.
And the expected number of people in the system:
Ls=n=0∑Nnpn=p0ρn=0∑c−2n!ρn+c!p0ρc(c1−ρ/c1−(ρ/c)N−c+1+cρ(1−ρ/c)21−(ρ/c)N−c+1−(N−c+1)(1−ρ/c)(ρ/c)N−c).
The probability that the queue is full and thus someone is rejected is
pN=c!cN−cρNp0.
As with the M/M/1/N queue, there is an effective arrival rate
λeff=λ(1−pN),
from which we can derive the service and wait times
Ws=Ls/λeff and Wq=Lq/λeff.
Simulation
Here is a Jupyter notebook simulating a single- and multiple-server infinite queue: