Distributions

Introduction

Salabim can be used with the standard random module, but it is usually easier to use the salabim distributions.

Internally, salabim uses the random module. There is always a seed associated with each distribution, which is normally random.random.

When a new environment is created, the random seed 1234567 will be set by default. However, it is possible to override this behaviour with the random_seed parameter:

  • any hashable value, to set another seed

  • null string (“”): no reseeding

  • “*”: true random, non reproducible (based on current time)

  • None: equivalent to 1234567

It is possible to (re)set the random seed also with the Environment.random_seed() method.

As a distribution is an instance of a class, it can be used in assignment, parameters, etc. E.g.

inter_arrival_time = sim.Uniform(10,15)

And then, to wait for a time sampled from this distribution

yield self.hold(inter_arrival_time.sample())

or

yield self.hold(inter_arrival_time())

or

yield self.hold(sim.Uniform(10,15).sample())

or

yield self.hold(sim.Uniform(10,15)())

Furthermore, all time related parameters of component methods accept the distribition itself instead of a float (from a sample). That means that in the above setup, we can also say

yield self.hold(inter_arrival_time)

or

yield self.hold(sim.Uniform(10,15))

All distributions are a subclass of _Distribution which supports the following methods:

  • mean()

  • sample()

  • direct calling as an alternative to sample, like Uniform(12,15)()

  • bounded_sample() # see below

Expressions with distributions

It is possible to build up a distribution with an expression containing one or more distributions. Examples

d0 = 5 - sim.Uniform(1, 2)  # equivalent to Uniform (3, 4)
d1 = sim.Normal(4, 1) // 1  # integer samples of a normal distribution
arrival_dis = sim.Pdf((0, 1, 2, 3, 4, 5, 6), (18, 18, 18, 18, 18, 8,2), 'days') + sim.Cdf((0,0, 8,10, 17, 90, 24, 100), 'hours')
  # this generates an arrival moment during the week, with emphasis on day 0-4. The distribution over the day concentrates between hour 8 and 17.

These will make an instance of the class _Expresssion, which can be used as any other distribution

arrival_dis.sample()
(sim.IntUniform(1,5) * 10).sample()  # this will return 10, 20, 30, 40 or 50.
(1 / sim.Uniform(1, 2))()  # this will return values between 0.5 and 1 (not uniform!)

Like all distributions, the _Expression class supports the mean(), sample(), bounded_sample() and print_info() methods. If the mean can’t be calculated, nan will be returned

(sim.Uniform(1, 2) / 10).mean()  # 0.15
(sim.Uniform(1, 2) + sim.Uniform(1, 2)).mean()  # 3
(10 / sim.Uniform(1, 2)).mean()  # nan
(sim.Uniform(1, 2) / sim.Uniform(1, 2)).mean()  # nan

Note that the expression may contain only the operator +, -, , /, // and *. Functions are not allowed. However the int function can be emulated with floor division (\\), as is in

d1 = sim.Normal(4, 1) // 1  # integer samples of a normal distribution

Bounded sampling

The class Bounded can be used to force a sampled value from a distribution to be within given bounds.

This is realized by checking if the sampled value is within these bounds. If not, another value is sampled, until the sample meets the requirements. If after, 100 retries (customizable) the sampled value does still not meet the requirements, a fail_value will be returned.

Examples

dis = sim.Bounded(sim.Normal(3, 1), lowerbound=0)
sample = dis.sample()  # normal distribution, non negative
sim.Bounded(sim.Exponential(6, upperbound=20).sample()  # exponential distribution <= 20
sim.Bounded(sim.Exponential(6, upperbound=20)()  # exponential distribution <= 20

It is alo possible to use the bounded_sample() method, with similar functionality. However, the Bounded class is prefered.

Use of time units in a distribution specification

All distributions apart from Poisson and Beta have an additional parameter, time_unit. If the time_unit is specified at initialization of Environment(), the time_unit of the distribution can now be specified.

As an example, suppose env has been initialized with env = sim.Environment(time_unit='hours'). If we then define a duration distribution as

duration_dis = sim.Uniform(10, 20, 'days')

, the distribution is effectively uniform between 240 and 480 (hours).

This facility makes specification of duration distribution easy and intuitive.

Available distributions

Beta

Beta distribution with a given

  • alpha (shape)

  • beta (shape)

E.g.

processing_time = sim.Beta(2,4)  # Beta with alpha=2, beta=4`

Constant

No sampling is required for this distribution, as it always returns the same value. E.g.

processing_time = sim.Constant(10)

Erlang

Erlang distribution with a given

  • shape (k)

  • rate (lambda) or scale (mu)

E.g.

inter_arrival_time = sim.Erlang(2, rate=2)  # Erlang-2, with lambda = 2

Exponential

Exponential distribution with a given

  • mean or rate (lambda)

E.g.

inter_arrival_time = sim.Exponential(10)  # on an average every 10 time units

Gamma

Gamma distribution with given

  • shape (k)

  • scale (teta) or rate (beta)

E.g.

processing_time = sim.Gamma(2,3)  # Gamma with k=2, teta=3

IntUniform

Integer uniform distribution between a given

  • lowerbound (inclusive)

  • upperbound (inclusive)

E.g.

die = sim.IntUniform(1, 6)

Normal

Normal distribution with a given

  • mean

  • standard deviation

E.g.

processing_time = sim.Normal(10, 2)  # Normal with mean=10, standard deviation=2

Note that this might result in negative values, which might not correct if it is a duration. In that case, use the Bound class to force a non negative value, like

yield self.hold(sim.Bounded(sim.Normal(10, 2), 0)())
yield self.hold(sim.Bounded(sim.Normal(10, 2), 0))

Normally, sampling is done with the random.normalvariate method. Alternatively, the random.gauss method can be used, by specifying use_gauss=True.

Poisson

Poisson distribution with a given lambda

E.g.

occurences_in_one_hour = sim.Poisson(10)  # Poisson distribution with lambda (and thus mean) = 10

Note that sampling from a Poisson distribution for large means can be rather time consuming. Therefore, salabim offers a flag to use the much faster Numpy algortihm

occurences_in_one_hour = sim.Poisson(10, prefer_numpy=True)

If numpy is not installed, the ‘normal’ algorithm will be used.

Triangular

Triangular distribution with a given

  • lowerbound

  • upperbound

  • mode

E.g.

processing_time = sim.Triangular(5, 15, 8)

Uniform

Uniform distribution between a given

  • lowerbound

  • upperbound

E.g.

processing_time = sim.Uniform(5, 15)

Weibull

Weibull distribution with given

  • scale (alpha or k)

  • shape (beta or lambda)

E.g.

time_between_failure = sim.Weibull(2, 5)  # Weibull with k=2. lambda=5

Cdf

Cumulative distribution function, specified as a list or tuple with x[i],p[i] values, where p[i] is the cumulative probability that xn<=pn. E.g.

processingtime = sim.Cdf((5, 0, 10, 50, 15, 90, 30, 95, 60, 100))

This means that 0% is <5, 50% is < 10, 90% is < 15, 95% is < 30 and 100% is <60.

Note

It is required that p[0] is 0 and that p[i]<=p[i+1] and that x[i]<=x[i+1].

It is not required that the last p[] is 100, as all p[]’s are automatically scaled. This means that the two distributions below are identical to the first example

processingtime = sim.Cdf((5, 0.00, 10, 0.50, 15, 0.90, 30, 0.95, 60, 1.00))
processingtime = sim.Cdf((5, 0, 10, 10, 15, 18, 30, 19, 60, 20))

Pdf

Probability density function, specified as:

  1. list or tuple of x[i], p[i] where p[i] is the probability (density)

  2. list or tuple of x[i] followed by a list or tuple p[i]

  3. list or tuple of x[i] followed by a scalar (value not important)

Note

It is required that the sum of p[i]’s is greater than 0.

E.g.

processingtime = sim.Pdf((5, 10, 10, 50, 15, 40))

This means that 10% is 5, 50% is 10 and 40% is 15.

It is not required that the sum of the p[i]’s is 100, as all p[]’s are automatically scaled. This means that the two distributions below are identical to the first example

processingtime = sim.Pdf((5, 0.10, 10, 0.50, 15, 0.40))
processingtime = sim.Pdf((5, 2, 10, 10, 15, 8))

And the same with the second form

processingtime = sim.Pdf((5, 10, 15), (10, 50, 40))

If all x[i]’s have the same probability, the third form is very useful

dice = sim.Pdf((1,2,3,4,5,6),1)  # the distribution IntUniform(1,6) does the job as well
dice = sim.Pdf(range(1,7), 1)  # same as above

x[i] may be of any type, so it possible to use

color = sim.Pdf(('Green', 45, 'Yellow', 10, 'Red', 45))
cartype = sim.Pdf(ordertypes,1)

If the x-value is a salabim distribution, not the distribution but a sample of that distribution is returned when sampling

processingtime = sim.Pdf((sim.Uniform(5, 10), 50, sim.Uniform(10, 15), 40, sim.Uniform(15, 20), 10))
proctime=processingtime.sample()

Here proctime will have a probability of 50% being between 5 and 10, 40% between 10 and 15 and 10% between 15 and 20.

Pdf supports also sampling a number of items from a pdf without replacement. In that case, the probabilities for all items have to be the same. If that is the case, multiple sampling can be done by specifying the number of items to sampled as a parameters to sample.

Examples

colors_dis = sim.Pdf(("red", "green", "blue", "yellow"), 1)
colors_dis.sample(4)  # e.g. ["yellow", "green", "blue", "red"]
colors_dis.sample(2)  # e.g. ["green", "blue"]
colors_dis,sample(1)  # e.g. ["blue"], so not "blue" !

CumPdf

Probability density function, specified as:

  1. list or tuple of x[i], p[i] where p[i] is the cumulative probability (density)

  2. list or tuple of x[i] followed by a list or tuple of probabilities p[i]

Note

It is required that p[i]<=p[i+1].

E.g.

processingtime = sim.CumPdf((5, 10, 10, 60, 15, 100))
urgent =  sim.CumPdf(True, 0.9, False, 1.0)

This means that 10% is 5, 50% is 10 and 40% is 15.

It is not required that the sum of the p[i]’s is 100, as all p[]’s are automatically scaled. This means that the two distributions below are identical to the first example

processingtime = sim.CumPdf((5, 0.10, 10, 0.60, 15, 1.00))
processingtime = sim.CumPdf((5, 2, 10, 12, 15, 20))

And the same with the second form

processingtime = sim.CumPdf((5, 10, 15), (10, 60, 100))

x[i] may be of any type, so it possible to use

color = sim.CumPdf(('Green', 45, 'Red', 100))

If the x-value is a salabim distribution, not the distribution but a sample of that distribution is returned when sampling

processingtime = sim.CumPdf((sim.Uniform(5, 10), 50, sim.Uniform(10, 15), 90, sim.Uniform(15, 20), 100))
proctime=processingtime.sample()

Here proctime will have a probability of 50% being between 5 and 10, 40% between 10 and 15 and 10% between 15 and 20.

External

The External distribution makes it possible to use a distribution from the modules

  • random

  • numpy.random

  • scipy.stats

as were it a salabim distribution.

The class takes at least one parameter (dis) that specifies the distribution to use, like

  • random.uniform

  • numpy.random.uniform

  • scipy.stats.uniform

Next, all postional and keyword parameters are used for sampling. In case of random and numpy.random by calling the method itself, in case of a scipy.stats distribution by calling rvs().

Examples ::

import random d = sim.External(random.lognormvariate, mu=5, sigma=1)

import numpy d = sim.External(numpy.random.laplace, loc=5, scale=1)

import scipy.stats as st d = sim.External(st.stats.beta, a=1, b=2 loc=4, scale=1)

Then sampling with

d.sample()

or

d()

If the size is given (only for numpy.random and scipy.stats distributions), salabim will return the sampled values successively. This can be useful to increase performance.

The mean() method returns the proper mean for scipy.stats distributions, nan otherwise.

If a time_unit parameter is given, the sampled values will be multiplied by the applicable factor, e.g.

env = sim.Environment(time_unit='seconds')
d = sim.External(st.norm, loc=2, time_unit='minutes')
print(d.mean())

will print out

120

Distribution

A special distribution is the Distribution class. Here, a string will contain the specification of the distribution. This is particularly useful when the distributions are specified in an external file. E.g.

with open('experiment.txt', 'r') as f:
    interarrivaltime = sim.Distribution(read(f))
    processingtime = sim.Distribution(read(f))
    numberofparcels = sim.Distribution(read(f))
    delay = sim.Distribution(read(f))

With a file experiment.txt

Uniform(10, 15)
Triangular(1, 5, 2, time_unit='minutes')
IntUniform(10, 20)
External(random.uniform, 2, 6)

or with abbreviation

Uni(10,15)
Tri(1, 5, 2, time_unit='minutes')
Int(10, 20)
Extern(random.uniform, 2, 6)

or even

U(10, 15)
T(1, 5, 2, time_unit='minutes')
I(10, 20)
Ext(random.uniform, 2, 6)

The specification of sim.Distribution also accepts a time_unit parameter. Note that if the specification string contains a time_unit parameter as well, the time_unit parameter of Distribution is ignored, e.g.

d = sim.Distribution('uniform(1, 2)', time_unit='minutes'))  # 1-2 minutes
d = sim.Distribution('uniform(1, 2, time_unit='hours')'))  # 1-2 hours, same as before
d = sim.Distribution('uniform(1, 2, time_unit='hours')', time_unit='minutes'))  # 1-2 hours, ignore minutes

Map

With the Map distribution is it possible to map the sampled values of a distribution to a given function. E.g.

round_normal = sim.Map(sim.Normal(10, 4), lambda x: round(x))

or, equivalently

round_normal = sim.Map(sim.Normal(10, 4), round)

The sampled values from the normal distribution will be rounded in either case.

Another example

positive_normal = sim.Map(sim.Normal(10, 4), lambda x: x if x > 0 else 0)

In this case, negative sampled values will be set to zero, positive values are not affected.

Note that this is different from

sim.Bounded(sim.Normal(10, 4), lowerbound=0)

as in the latter case resampling will be done when a negative value is sampled, in other words, effectively no zero samples.