Distributions
Introduction
Salabim can be used with the standard random module, but it is usually easier to use the salabim distributions.
Internally, salabim uses the random module. There is always a seed associated with each distribution, which is normally random.random.
When a new environment is created, the random seed 1234567 will be set by default. However, it is possible to override this behaviour with the random_seed parameter:
any hashable value, to set another seed
null string (“”): no reseeding
“*”: true random, non reproducible (based on current time)
None: equivalent to 1234567
It is possible to (re)set the random seed also with the Environment.random_seed()
method.
As a distribution is an instance of a class, it can be used in assignment, parameters, etc. E.g.
inter_arrival_time = sim.Uniform(10,15)
And then, to wait for a time sampled from this distribution
self.hold(inter_arrival_time.sample())
or
self.hold(inter_arrival_time())
or
self.hold(sim.Uniform(10,15).sample())
or
self.hold(sim.Uniform(10,15)())
Furthermore, all time related parameters of component methods accept the distribition itself instead of a float (from a sample). That means that in the above setup, we can also say
self.hold(inter_arrival_time)
or
self.hold(sim.Uniform(10,15))
All distributions are a subclass of _Distribution which supports the following methods:
mean()
sample()
direct calling as an alternative to sample, like Uniform(12,15)()
bounded_sample() # see below
Expressions with distributions
It is possible to build up a distribution with an expression containing one or more distributions. Examples
d0 = 5 - sim.Uniform(1, 2) # equivalent to Uniform (3, 4)
d1 = sim.Normal(4, 1) // 1 # integer samples of a normal distribution
arrival_dis = sim.Pdf((0, 1, 2, 3, 4, 5, 6), (18, 18, 18, 18, 18, 8,2), 'days') + sim.Cdf((0,0, 8,10, 17, 90, 24, 100), 'hours')
# this generates an arrival moment during the week, with emphasis on day 0-4. The distribution over the day concentrates between hour 8 and 17.
These will make an instance of the class _Expresssion, which can be used as any other distribution
arrival_dis.sample()
(sim.IntUniform(1,5) * 10).sample() # this will return 10, 20, 30, 40 or 50.
(1 / sim.Uniform(1, 2))() # this will return values between 0.5 and 1 (not uniform!)
Like all distributions, the _Expression class supports the mean(), sample(), bounded_sample() and print_info() methods. If the mean can’t be calculated, nan will be returned
(sim.Uniform(1, 2) / 10).mean() # 0.15
(sim.Uniform(1, 2) + sim.Uniform(1, 2)).mean() # 3
(10 / sim.Uniform(1, 2)).mean() # nan
(sim.Uniform(1, 2) / sim.Uniform(1, 2)).mean() # nan
Note that the expression may contain only the operator +, -, , /, // and *.
Functions are not allowed. However the int
function can be emulated with floor division (\\
), as is in
d1 = sim.Normal(4, 1) // 1 # integer samples of a normal distribution
Bounded sampling
The class Bounded can be used to force a sampled value from a distribution to be within given bounds.
This is realized by checking if the sampled value is within these bounds. If not, another value is sampled, until the sample meets the requirements. If after, 100 retries (customizable) the sampled value does still not meet the requirements, a fail_value will be returned.
Examples
dis = sim.Bounded(sim.Normal(3, 1), lowerbound=0)
sample = dis.sample() # normal distribution, non negative
sim.Bounded(sim.Exponential(6, upperbound=20).sample() # exponential distribution <= 20
sim.Bounded(sim.Exponential(6, upperbound=20)() # exponential distribution <= 20
It is alo possible to use the bounded_sample() method, with similar functionality. However, the Bounded class is prefered.
Use of time units in a distribution specification
All distributions apart from Poisson and Beta have an additional parameter, time_unit. If the time_unit is specified at initialization of Environment(), the time_unit of the distribution can now be specified.
As an example, suppose env has been initialized with env = sim.Environment(time_unit='hours')
.
If we then define a duration distribution as
duration_dis = sim.Uniform(10, 20, 'days')
, the distribution is effectively uniform between 240 and 480 (hours).
This facility makes specification of duration distribution easy and intuitive.
Available distributions
Beta
Beta distribution with a given
alpha (shape)
beta (shape)
E.g.
processing_time = sim.Beta(2,4) # Beta with alpha=2, beta=4`
Constant
No sampling is required for this distribution, as it always returns the same value. E.g.
processing_time = sim.Constant(10)
Erlang
Erlang distribution with a given
shape (k)
rate (lambda) or scale (mu)
E.g.
inter_arrival_time = sim.Erlang(2, rate=2) # Erlang-2, with lambda = 2
Exponential
Exponential distribution with a given
mean or rate (lambda)
E.g.
inter_arrival_time = sim.Exponential(10) # on an average every 10 time units
Gamma
Gamma distribution with given
shape (k)
scale (teta) or rate (beta)
E.g.
processing_time = sim.Gamma(2,3) # Gamma with k=2, teta=3
IntUniform
Integer uniform distribution between a given
lowerbound (inclusive)
upperbound (inclusive)
E.g.
die = sim.IntUniform(1, 6)
Normal
Normal distribution with a given
mean
standard deviation
E.g.
processing_time = sim.Normal(10, 2) # Normal with mean=10, standard deviation=2
Note that this might result in negative values, which might not correct if it is a duration. In that case, use the Bound class to force a non negative value, like
self.hold(sim.Bounded(sim.Normal(10, 2), 0)())
self.hold(sim.Bounded(sim.Normal(10, 2), 0))
Normally, sampling is done with the random.normalvariate method. Alternatively, the random.gauss method can be used, by specifying use_gauss=True.
Poisson
Poisson distribution with a given lambda
E.g.
occurences_in_one_hour = sim.Poisson(10) # Poisson distribution with lambda (and thus mean) = 10
Note that sampling from a Poisson distribution for large means can be rather time consuming. Therefore, salabim offers a flag to use the much faster Numpy algortihm
occurences_in_one_hour = sim.Poisson(10, prefer_numpy=True)
If numpy is not installed, the ‘normal’ algorithm will be used.
Triangular
Triangular distribution with a given
lowerbound
upperbound
mode
E.g.
processing_time = sim.Triangular(5, 15, 8)
Uniform
Uniform distribution between a given
lowerbound
upperbound
E.g.
processing_time = sim.Uniform(5, 15)
Weibull
Weibull distribution with given
scale (alpha or k)
shape (beta or lambda)
E.g.
time_between_failure = sim.Weibull(2, 5) # Weibull with k=2. lambda=5
Cdf
Cumulative distribution function, specified as a list or tuple with x[i],p[i] values, where p[i] is the cumulative probability that xn<=pn. E.g.
processingtime = sim.Cdf((5, 0, 10, 50, 15, 90, 30, 95, 60, 100))
This means that 0% is <5, 50% is < 10, 90% is < 15, 95% is < 30 and 100% is <60.
Note
It is required that p[0] is 0 and that p[i]<=p[i+1] and that x[i]<=x[i+1].
It is not required that the last p[] is 100, as all p[]’s are automatically scaled. This means that the two distributions below are identical to the first example
processingtime = sim.Cdf((5, 0.00, 10, 0.50, 15, 0.90, 30, 0.95, 60, 1.00))
processingtime = sim.Cdf((5, 0, 10, 10, 15, 18, 30, 19, 60, 20))
Pdf
Probability density function, specified as:
list or tuple of x[i], p[i] where p[i] is the probability (density)
list or tuple of x[i] followed by a list or tuple p[i]
list or tuple of x[i] followed by a scalar (value not important)
Note
It is required that the sum of p[i]’s is greater than 0.
E.g.
processingtime = sim.Pdf((5, 10, 10, 50, 15, 40))
This means that 10% is 5, 50% is 10 and 40% is 15.
It is not required that the sum of the p[i]’s is 100, as all p[]’s are automatically scaled. This means that the two distributions below are identical to the first example
processingtime = sim.Pdf((5, 0.10, 10, 0.50, 15, 0.40))
processingtime = sim.Pdf((5, 2, 10, 10, 15, 8))
And the same with the second form
processingtime = sim.Pdf((5, 10, 15), (10, 50, 40))
If all x[i]’s have the same probability, the third form is very useful
dice = sim.Pdf((1,2,3,4,5,6),1) # the distribution IntUniform(1,6) does the job as well
dice = sim.Pdf(range(1,7), 1) # same as above
x[i] may be of any type, so it possible to use
color = sim.Pdf(('Green', 45, 'Yellow', 10, 'Red', 45))
cartype = sim.Pdf(ordertypes,1)
If the x-value is a salabim distribution, not the distribution but a sample of that distribution is returned when sampling
processingtime = sim.Pdf((sim.Uniform(5, 10), 50, sim.Uniform(10, 15), 40, sim.Uniform(15, 20), 10))
proctime=processingtime.sample()
Here proctime will have a probability of 50% being between 5 and 10, 40% between 10 and 15 and 10% between 15 and 20.
Pdf supports also sampling a number of items from a pdf without replacement. In that case, the probabilities for all items have to be the same. If that is the case, multiple sampling can be done by specifying the number of items to sampled as a parameters to sample.
Examples
colors_dis = sim.Pdf(("red", "green", "blue", "yellow"), 1)
colors_dis.sample(4) # e.g. ["yellow", "green", "blue", "red"]
colors_dis.sample(2) # e.g. ["green", "blue"]
colors_dis,sample(1) # e.g. ["blue"], so not "blue" !
CumPdf
Probability density function, specified as:
list or tuple of x[i], p[i] where p[i] is the cumulative probability (density)
list or tuple of x[i] followed by a list or tuple of probabilities p[i]
Note
It is required that p[i]<=p[i+1].
E.g.
processingtime = sim.CumPdf((5, 10, 10, 60, 15, 100))
urgent = sim.CumPdf(True, 0.9, False, 1.0)
This means that 10% is 5, 50% is 10 and 40% is 15.
It is not required that the sum of the p[i]’s is 100, as all p[]’s are automatically scaled. This means that the two distributions below are identical to the first example
processingtime = sim.CumPdf((5, 0.10, 10, 0.60, 15, 1.00))
processingtime = sim.CumPdf((5, 2, 10, 12, 15, 20))
And the same with the second form
processingtime = sim.CumPdf((5, 10, 15), (10, 60, 100))
x[i] may be of any type, so it possible to use
color = sim.CumPdf(('Green', 45, 'Red', 100))
If the x-value is a salabim distribution, not the distribution but a sample of that distribution is returned when sampling
processingtime = sim.CumPdf((sim.Uniform(5, 10), 50, sim.Uniform(10, 15), 90, sim.Uniform(15, 20), 100))
proctime=processingtime.sample()
Here proctime will have a probability of 50% being between 5 and 10, 40% between 10 and 15 and 10% between 15 and 20.
External
The External distribution makes it possible to use a distribution from the modules
random
numpy.random
scipy.stats
as were it a salabim distribution.
The class takes at least one parameter (dis) that specifies the distribution to use, like
random.uniform
numpy.random.uniform
scipy.stats.uniform
Next, all postional and keyword parameters are used for sampling. In case of random and numpy.random by calling the method itself, in case of a scipy.stats distribution by calling rvs().
- Examples ::
import random d = sim.External(random.lognormvariate, mu=5, sigma=1)
import numpy d = sim.External(numpy.random.laplace, loc=5, scale=1)
import scipy.stats as st d = sim.External(st.stats.beta, a=1, b=2 loc=4, scale=1)
Then sampling with
d.sample()
or
d()
If the size is given (only for numpy.random and scipy.stats distributions), salabim will return the sampled values successively. This can be useful to increase performance.
The mean() method returns the proper mean for scipy.stats distributions, nan otherwise.
If a time_unit parameter is given, the sampled values will be multiplied by the applicable factor, e.g.
env = sim.Environment(time_unit='seconds')
d = sim.External(st.norm, loc=2, time_unit='minutes')
print(d.mean())
will print out
120
Distribution
A special distribution is the Distribution class. Here, a string will contain the specification of the distribution. This is particularly useful when the distributions are specified in an external file. E.g.
with open('experiment.txt', 'r') as f:
interarrivaltime = sim.Distribution(read(f))
processingtime = sim.Distribution(read(f))
numberofparcels = sim.Distribution(read(f))
delay = sim.Distribution(read(f))
With a file experiment.txt
Uniform(10, 15)
Triangular(1, 5, 2, time_unit='minutes')
IntUniform(10, 20)
External(random.uniform, 2, 6)
or with abbreviation
Uni(10,15)
Tri(1, 5, 2, time_unit='minutes')
Int(10, 20)
Extern(random.uniform, 2, 6)
or even
U(10, 15)
T(1, 5, 2, time_unit='minutes')
I(10, 20)
Ext(random.uniform, 2, 6)
The specification of sim.Distribution also accepts a time_unit parameter. Note that if the specification string contains a time_unit parameter as well, the time_unit parameter of Distribution is ignored, e.g.
d = sim.Distribution('uniform(1, 2)', time_unit='minutes')) # 1-2 minutes
d = sim.Distribution('uniform(1, 2, time_unit='hours')')) # 1-2 hours, same as before
d = sim.Distribution('uniform(1, 2, time_unit='hours')', time_unit='minutes')) # 1-2 hours, ignore minutes
Map
With the Map distribution is it possible to map the sampled values of a distribution to a given function. E.g.
round_normal = sim.Map(sim.Normal(10, 4), lambda x: round(x))
or, equivalently
round_normal = sim.Map(sim.Normal(10, 4), round)
The sampled values from the normal distribution will be rounded in either case.
Another example
positive_normal = sim.Map(sim.Normal(10, 4), lambda x: x if x > 0 else 0)
In this case, negative sampled values will be set to zero, positive values are not affected.
Note that this is different from
sim.Bounded(sim.Normal(10, 4), lowerbound=0)
as in the latter case resampling will be done when a negative value is sampled, in other words, effectively no zero samples.