# Distributions

## Introduction

Salabim can be used with the standard random module, but it is usually easier to use the salabim distributions.

Internally, salabim uses the random module. There is always a seed associated with each distribution, which is normally random.random.

When a new environment is created, the random seed 1234567 will be set by default. However, it is possible to override this behaviour with the random_seed parameter:

• any hashable value, to set another seed

• null string (“”): no reseeding

• “*”: true random, non reproducible (based on current time)

• None: equivalent to 1234567

It is possible to (re)set the random seed also with the `Environment.random_seed()` method.

As a distribution is an instance of a class, it can be used in assignment, parameters, etc. E.g.

```inter_arrival_time = sim.Uniform(10,15)
```

And then, to wait for a time sampled from this distribution

```self.hold(inter_arrival_time.sample())
```

or

```self.hold(inter_arrival_time())
```

or

```self.hold(sim.Uniform(10,15).sample())
```

or

```self.hold(sim.Uniform(10,15)())
```

Furthermore, all time related parameters of component methods accept the distribition itself instead of a float (from a sample). That means that in the above setup, we can also say

```self.hold(inter_arrival_time)
```

or

```self.hold(sim.Uniform(10,15))
```

All distributions are a subclass of _Distribution which supports the following methods:

• mean()

• sample()

• direct calling as an alternative to sample, like Uniform(12,15)()

• bounded_sample() # see below

## Expressions with distributions

It is possible to build up a distribution with an expression containing one or more distributions. Examples

```d0 = 5 - sim.Uniform(1, 2)  # equivalent to Uniform (3, 4)
d1 = sim.Normal(4, 1) // 1  # integer samples of a normal distribution
arrival_dis = sim.Pdf((0, 1, 2, 3, 4, 5, 6), (18, 18, 18, 18, 18, 8,2), 'days') + sim.Cdf((0,0, 8,10, 17, 90, 24, 100), 'hours')
# this generates an arrival moment during the week, with emphasis on day 0-4. The distribution over the day concentrates between hour 8 and 17.
```

These will make an instance of the class _Expresssion, which can be used as any other distribution

```arrival_dis.sample()
(sim.IntUniform(1,5) * 10).sample()  # this will return 10, 20, 30, 40 or 50.
(1 / sim.Uniform(1, 2))()  # this will return values between 0.5 and 1 (not uniform!)
```

Like all distributions, the _Expression class supports the mean(), sample(), bounded_sample() and print_info() methods. If the mean can’t be calculated, nan will be returned

```(sim.Uniform(1, 2) / 10).mean()  # 0.15
(sim.Uniform(1, 2) + sim.Uniform(1, 2)).mean()  # 3
(10 / sim.Uniform(1, 2)).mean()  # nan
(sim.Uniform(1, 2) / sim.Uniform(1, 2)).mean()  # nan
```

Note that the expression may contain only the operator +, -, , /, // and *. Functions are not allowed. However the `int` function can be emulated with floor division (`\\`), as is in

```d1 = sim.Normal(4, 1) // 1  # integer samples of a normal distribution
```

## Bounded sampling

The class Bounded can be used to force a sampled value from a distribution to be within given bounds.

This is realized by checking if the sampled value is within these bounds. If not, another value is sampled, until the sample meets the requirements. If after, 100 retries (customizable) the sampled value does still not meet the requirements, a fail_value will be returned.

Examples

```dis = sim.Bounded(sim.Normal(3, 1), lowerbound=0)
sample = dis.sample()  # normal distribution, non negative
sim.Bounded(sim.Exponential(6, upperbound=20).sample()  # exponential distribution <= 20
sim.Bounded(sim.Exponential(6, upperbound=20)()  # exponential distribution <= 20
```

It is alo possible to use the bounded_sample() method, with similar functionality. However, the Bounded class is prefered.

## Use of time units in a distribution specification

All distributions apart from Poisson and Beta have an additional parameter, time_unit. If the time_unit is specified at initialization of Environment(), the time_unit of the distribution can now be specified.

As an example, suppose env has been initialized with `env = sim.Environment(time_unit='hours')`. If we then define a duration distribution as

```duration_dis = sim.Uniform(10, 20, 'days')
```

, the distribution is effectively uniform between 240 and 480 (hours).

This facility makes specification of duration distribution easy and intuitive.

## Available distributions

### Beta

Beta distribution with a given

• alpha (shape)

• beta (shape)

E.g.

```processing_time = sim.Beta(2,4)  # Beta with alpha=2, beta=4`
```

### Constant

No sampling is required for this distribution, as it always returns the same value. E.g.

```processing_time = sim.Constant(10)
```

### Erlang

Erlang distribution with a given

• shape (k)

• rate (lambda) or scale (mu)

E.g.

```inter_arrival_time = sim.Erlang(2, rate=2)  # Erlang-2, with lambda = 2
```

### Exponential

Exponential distribution with a given

• mean or rate (lambda)

E.g.

```inter_arrival_time = sim.Exponential(10)  # on an average every 10 time units
```

### Gamma

Gamma distribution with given

• shape (k)

• scale (teta) or rate (beta)

E.g.

```processing_time = sim.Gamma(2,3)  # Gamma with k=2, teta=3
```

### IntUniform

Integer uniform distribution between a given

• lowerbound (inclusive)

• upperbound (inclusive)

E.g.

```die = sim.IntUniform(1, 6)
```

### Normal

Normal distribution with a given

• mean

• standard deviation

E.g.

```processing_time = sim.Normal(10, 2)  # Normal with mean=10, standard deviation=2
```

Note that this might result in negative values, which might not correct if it is a duration. In that case, use the Bound class to force a non negative value, like

```self.hold(sim.Bounded(sim.Normal(10, 2), 0)())
self.hold(sim.Bounded(sim.Normal(10, 2), 0))
```

Normally, sampling is done with the random.normalvariate method. Alternatively, the random.gauss method can be used, by specifying use_gauss=True.

### Poisson

Poisson distribution with a given lambda

E.g.

```occurences_in_one_hour = sim.Poisson(10)  # Poisson distribution with lambda (and thus mean) = 10
```

Note that sampling from a Poisson distribution for large means can be rather time consuming. Therefore, salabim offers a flag to use the much faster Numpy algortihm

```occurences_in_one_hour = sim.Poisson(10, prefer_numpy=True)
```

If numpy is not installed, the ‘normal’ algorithm will be used.

### Triangular

Triangular distribution with a given

• lowerbound

• upperbound

• mode

E.g.

```processing_time = sim.Triangular(5, 15, 8)
```

### Uniform

Uniform distribution between a given

• lowerbound

• upperbound

E.g.

```processing_time = sim.Uniform(5, 15)
```

### Weibull

Weibull distribution with given

• scale (alpha or k)

• shape (beta or lambda)

E.g.

```time_between_failure = sim.Weibull(2, 5)  # Weibull with k=2. lambda=5
```

### Cdf

Cumulative distribution function, specified as a list or tuple with x[i],p[i] values, where p[i] is the cumulative probability that xn<=pn. E.g.

```processingtime = sim.Cdf((5, 0, 10, 50, 15, 90, 30, 95, 60, 100))
```

This means that 0% is <5, 50% is < 10, 90% is < 15, 95% is < 30 and 100% is <60.

Note

It is required that p[0] is 0 and that p[i]<=p[i+1] and that x[i]<=x[i+1].

It is not required that the last p[] is 100, as all p[]’s are automatically scaled. This means that the two distributions below are identical to the first example

```processingtime = sim.Cdf((5, 0.00, 10, 0.50, 15, 0.90, 30, 0.95, 60, 1.00))
processingtime = sim.Cdf((5, 0, 10, 10, 15, 18, 30, 19, 60, 20))
```

### Pdf

Probability density function, specified as:

1. list or tuple of x[i], p[i] where p[i] is the probability (density)

2. list or tuple of x[i] followed by a list or tuple p[i]

3. list or tuple of x[i] followed by a scalar (value not important)

Note

It is required that the sum of p[i]’s is greater than 0.

E.g.

```processingtime = sim.Pdf((5, 10, 10, 50, 15, 40))
```

This means that 10% is 5, 50% is 10 and 40% is 15.

It is not required that the sum of the p[i]’s is 100, as all p[]’s are automatically scaled. This means that the two distributions below are identical to the first example

```processingtime = sim.Pdf((5, 0.10, 10, 0.50, 15, 0.40))
processingtime = sim.Pdf((5, 2, 10, 10, 15, 8))
```

And the same with the second form

```processingtime = sim.Pdf((5, 10, 15), (10, 50, 40))
```

If all x[i]’s have the same probability, the third form is very useful

```dice = sim.Pdf((1,2,3,4,5,6),1)  # the distribution IntUniform(1,6) does the job as well
dice = sim.Pdf(range(1,7), 1)  # same as above
```

x[i] may be of any type, so it possible to use

```color = sim.Pdf(('Green', 45, 'Yellow', 10, 'Red', 45))
cartype = sim.Pdf(ordertypes,1)
```

If the x-value is a salabim distribution, not the distribution but a sample of that distribution is returned when sampling

```processingtime = sim.Pdf((sim.Uniform(5, 10), 50, sim.Uniform(10, 15), 40, sim.Uniform(15, 20), 10))
proctime=processingtime.sample()
```

Here proctime will have a probability of 50% being between 5 and 10, 40% between 10 and 15 and 10% between 15 and 20.

Pdf supports also sampling a number of items from a pdf without replacement. In that case, the probabilities for all items have to be the same. If that is the case, multiple sampling can be done by specifying the number of items to sampled as a parameters to sample.

Examples

```colors_dis = sim.Pdf(("red", "green", "blue", "yellow"), 1)
colors_dis.sample(4)  # e.g. ["yellow", "green", "blue", "red"]
colors_dis.sample(2)  # e.g. ["green", "blue"]
colors_dis,sample(1)  # e.g. ["blue"], so not "blue" !
```

### CumPdf

Probability density function, specified as:

1. list or tuple of x[i], p[i] where p[i] is the cumulative probability (density)

2. list or tuple of x[i] followed by a list or tuple of probabilities p[i]

Note

It is required that p[i]<=p[i+1].

E.g.

```processingtime = sim.CumPdf((5, 10, 10, 60, 15, 100))
urgent =  sim.CumPdf(True, 0.9, False, 1.0)
```

This means that 10% is 5, 50% is 10 and 40% is 15.

It is not required that the sum of the p[i]’s is 100, as all p[]’s are automatically scaled. This means that the two distributions below are identical to the first example

```processingtime = sim.CumPdf((5, 0.10, 10, 0.60, 15, 1.00))
processingtime = sim.CumPdf((5, 2, 10, 12, 15, 20))
```

And the same with the second form

```processingtime = sim.CumPdf((5, 10, 15), (10, 60, 100))
```

x[i] may be of any type, so it possible to use

```color = sim.CumPdf(('Green', 45, 'Red', 100))
```

If the x-value is a salabim distribution, not the distribution but a sample of that distribution is returned when sampling

```processingtime = sim.CumPdf((sim.Uniform(5, 10), 50, sim.Uniform(10, 15), 90, sim.Uniform(15, 20), 100))
proctime=processingtime.sample()
```

Here proctime will have a probability of 50% being between 5 and 10, 40% between 10 and 15 and 10% between 15 and 20.

### External

The External distribution makes it possible to use a distribution from the modules

• random

• numpy.random

• scipy.stats

as were it a salabim distribution.

The class takes at least one parameter (dis) that specifies the distribution to use, like

• random.uniform

• numpy.random.uniform

• scipy.stats.uniform

Next, all postional and keyword parameters are used for sampling. In case of random and numpy.random by calling the method itself, in case of a scipy.stats distribution by calling rvs().

Examples ::

import random d = sim.External(random.lognormvariate, mu=5, sigma=1)

import numpy d = sim.External(numpy.random.laplace, loc=5, scale=1)

import scipy.stats as st d = sim.External(st.stats.beta, a=1, b=2 loc=4, scale=1)

Then sampling with

```d.sample()
```

or

```d()
```

If the size is given (only for numpy.random and scipy.stats distributions), salabim will return the sampled values successively. This can be useful to increase performance.

The mean() method returns the proper mean for scipy.stats distributions, nan otherwise.

If a time_unit parameter is given, the sampled values will be multiplied by the applicable factor, e.g.

```env = sim.Environment(time_unit='seconds')
d = sim.External(st.norm, loc=2, time_unit='minutes')
print(d.mean())
```

will print out

```120
```

### Distribution

A special distribution is the Distribution class. Here, a string will contain the specification of the distribution. This is particularly useful when the distributions are specified in an external file. E.g.

```with open('experiment.txt', 'r') as f:
```

With a file experiment.txt

```Uniform(10, 15)
Triangular(1, 5, 2, time_unit='minutes')
IntUniform(10, 20)
External(random.uniform, 2, 6)
```

or with abbreviation

```Uni(10,15)
Tri(1, 5, 2, time_unit='minutes')
Int(10, 20)
Extern(random.uniform, 2, 6)
```

or even

```U(10, 15)
T(1, 5, 2, time_unit='minutes')
I(10, 20)
Ext(random.uniform, 2, 6)
```

The specification of sim.Distribution also accepts a time_unit parameter. Note that if the specification string contains a time_unit parameter as well, the time_unit parameter of Distribution is ignored, e.g.

```d = sim.Distribution('uniform(1, 2)', time_unit='minutes'))  # 1-2 minutes
d = sim.Distribution('uniform(1, 2, time_unit='hours')'))  # 1-2 hours, same as before
d = sim.Distribution('uniform(1, 2, time_unit='hours')', time_unit='minutes'))  # 1-2 hours, ignore minutes
```

### Map

With the Map distribution is it possible to map the sampled values of a distribution to a given function. E.g.

```round_normal = sim.Map(sim.Normal(10, 4), lambda x: round(x))
```

or, equivalently

```round_normal = sim.Map(sim.Normal(10, 4), round)
```

The sampled values from the normal distribution will be rounded in either case.

Another example

```positive_normal = sim.Map(sim.Normal(10, 4), lambda x: x if x > 0 else 0)
```

In this case, negative sampled values will be set to zero, positive values are not affected.

Note that this is different from

```sim.Bounded(sim.Normal(10, 4), lowerbound=0)
```

as in the latter case resampling will be done when a negative value is sampled, in other words, effectively no zero samples.