tobi - 1 year ago 114

Python Question

Hi I wanted to generate some random numbers with pareto distribution. I've found that this is possible using numpy. But I don't know hot to shape the outcome. For example i want to have results in range: 10-20, but how can i achieve this?

I know the syntax for using pareto from numpy

`numpy.random.pareto(m, s)`

I can't understand what m is for (I've been looking in wikipedia, but i don't understand one bit)? I know that s i size of generated tuple.

Answer Source

The documentation seems to have a mistake which might be confusing you.

Normally the parameter names in the call signature:

```
numpy.random.pareto(a, size=None)
```

Match the parameter names with the given details:

```
Parameters
----------
shape : float, > 0.
Shape of the distribution.
size : tuple of ints
Output shape. If the given shape is, e.g., ``(m, n, k)``, then
``m * n * k`` samples are drawn.
```

But you see that the first parameter is called both `a`

and `shape`

. Pass your desired *shape* as the first argument to the function to get a distribution of `size`

numbers (they're not a `tuple`

, but a numpy `array`

).

If you need to change the second parameter (called x_{m} on wikipedia), then just add it to all values, as in the example from the docs:

```
Examples
--------
Draw samples from the distribution:
>>> a, m = 3., 1. # shape and mode
>>> s = np.random.pareto(a, 1000) + m
```

So, it is trivial to implement a lower bound: just use your lower bound for `m`

:

```
lower = 10 # the lower bound for your values
shape = 1 # the distribution shape parameter, also known as `a` or `alpha`
size = 1000 # the size of your sample (number of random values)
```

And create the distribution with the lower bound:

```
x = np.random.pareto(shape, size) + lower
```

However, the Pareto distribution is not bounded from above, so if you try to cut it off it will really be a truncated version of the distribution, which is not quite the same thing, so be careful. If the shape parameter much bigger than 1, the distribution decays algebraically, as x^{ – (a+1)}, so you won't see very many large values anyway.

If you choose to implement the upper bound, a simple way is to generate the ordinary sample then remove any values that exceed your limit:

```
upper = 20
x = x[x<upper] # only values where x < upper
```

But now the size of your sample is (possibly) smaller. You could keep adding new ones (and filtering out the values that are too large) until the size is what you want, but it would be simpler to make it sufficiently large in the first place, then use only `size`

of them:

```
x = np.random.pareto(shape, size*5/4) + lower
x = x[x<upper][:size]
```