tobi tobi - 1 year ago 82
Python Question

How to generate random numbers in specyfic range using pareto distribution in Python

Hi I wanted to generate some random numbers with pareto distribution. I've found that this is possible using numpy. But I don't know hot to shape the outcome. For example i want to have results in range: 10-20, but how can i achieve this?

I know the syntax for using pareto from numpy

numpy.random.pareto(m, s)

I can't understand what m is for (I've been looking in wikipedia, but i don't understand one bit)? I know that s i size of generated tuple.

Answer Source

The documentation seems to have a mistake which might be confusing you.

Normally the parameter names in the call signature:

numpy.random.pareto(a, size=None)

Match the parameter names with the given details:

shape : float, > 0.
    Shape of the distribution.
size : tuple of ints
    Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
    ``m * n * k`` samples are drawn.

But you see that the first parameter is called both a and shape. Pass your desired shape as the first argument to the function to get a distribution of size numbers (they're not a tuple, but a numpy array).

If you need to change the second parameter (called xm on wikipedia), then just add it to all values, as in the example from the docs:

Draw samples from the distribution:

>>> a, m = 3., 1. # shape and mode
>>> s = np.random.pareto(a, 1000) + m

So, it is trivial to implement a lower bound: just use your lower bound for m:

lower = 10  # the lower bound for your values
shape = 1   # the distribution shape parameter, also known as `a` or `alpha`
size = 1000 # the size of your sample (number of random values)

And create the distribution with the lower bound:

x = np.random.pareto(shape, size) + lower

However, the Pareto distribution is not bounded from above, so if you try to cut it off it will really be a truncated version of the distribution, which is not quite the same thing, so be careful. If the shape parameter much bigger than 1, the distribution decays algebraically, as x – (a+1), so you won't see very many large values anyway.

If you choose to implement the upper bound, a simple way is to generate the ordinary sample then remove any values that exceed your limit:

upper = 20
x = x[x<upper]  # only values where x < upper

But now the size of your sample is (possibly) smaller. You could keep adding new ones (and filtering out the values that are too large) until the size is what you want, but it would be simpler to make it sufficiently large in the first place, then use only size of them:

x = np.random.pareto(shape, size*5/4) + lower
x = x[x<upper][:size]