 ThePredator - 3 years ago 105
Python Question

# Generate random numbers replicating arbitrary distribution

I have data wherein I have a variable

`z`
that contains around 4000 values (from 0.0 to 1.0) for which the histogram looks like this. Now I need to generate a random variable, call it
`random_z`
which should replicate the above distribution.

What I have tried so far is to generate a normal distribution centered at 1.0 so that I can remove all those above 1.0 to get a distribution that will be similar. I have been using
`numpy.random.normal`
but the problem is that I cannot set the range from 0.0 to 1.0, because usually normal distribution has a mean = 0.0 and std dev = 1.0.

Is there another way to go about generating this distribution in Python? Paul

If you want to bootstrap you could use `random.choice()` on your observed series.

Here I'll assume you'd like to smooth a bit more than that and you aren't concerned with generating new extreme values.

Use `pandas.Series.quantile()` and a uniform [0,1] random number generator, as follows.

Training

• Put your random sample into a pandas Series, call this series `S`

Production

1. Generate a random number `u` between 0.0 and 1.0 the usual way, e.g., `random.random()`
2. return `S.quantile(u)`

If you'd rather use `numpy` than `pandas`, from a quick reading it looks like you can substitute `numpy.percentile()` in step 2.

Principle of Operation:

From the sample S, `pandas.series.quantile()` or `numpy.percentile()` is used to calculate the inverse cumulative distribution function for the method of Inverse transform sampling. The quantile or percentile function (relative to S) transforms a uniform [0,1] pseudo random number to a pseudo random number having the range and distribution of the sample S.

# Simple Sample Code

If you need to minimize coding and don't want to write and use functions that only returns a single realization, then it seems `numpy.percentile` bests `pandas.Series.quantile`.

Let S be a pre-existing sample.

u will be the new uniform random numbers

newR will be the new randoms drawn from a S-like distribution.

``````>>> import numpy as np
``````

I need a sample of the kind of random numbers to be duplicated to put in `S`.

For the purposes of creating an example, I am going to raise some uniform [0,1] random numbers to the third power and call that the sample `S`. By choosing to generate the example sample in this way, I will know in advance -- from the mean being equal to the definite integral of (x^3)(dx) evaluated from 0 to 1 -- that the mean of S should be `1/(3+1)` = `1/4` = `0.25`

In your application, you would need to do something else instead, perhaps read a file, to create a numpy array `S` containing the data sample whose distribution is to be duplicated.

``````>>> S = pow(np.random.random(1000),3)  # S will be 1000 samples of a power distribution
``````

Here I will check that the mean of S is 0.25 as stated above.

``````>>> S.mean()
0.25296623781420458 # OK
``````

get the min and max just to show how np.percentile works

``````>>> S.min()
6.1091277680105382e-10
>>> S.max()
0.99608676594692624
``````

The numpy.percentile function maps 0-100 to the range of S.

``````>>> np.percentile(S,0)  # this should match the min of S
6.1091277680105382e-10 # and it does

>>> np.percentile(S,100) # this should match the max of S
0.99608676594692624 # and it does

>>> np.percentile(S,[0,100])  # this should send back an array with both min, max
[6.1091277680105382e-10, 0.99608676594692624]  # and it does

>>> np.percentile(S,np.array([0,100])) # but this doesn't....
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 2803, in percentile
if q == 0:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
``````

This isn't so great if we generate 100 new values, starting with uniforms:

``````>>> u = np.random.random(100)
``````

because it will error out, and the scale of u is 0-1, and 0-100 is needed.

This will work:

``````>>> newR = np.percentile(S, (100*u).tolist())
``````

which works fine but might need its type adjusted if you want a numpy array back

``````>>> type(newR)
<type 'list'>

>>> newR = np.array(newR)
``````

Now we have a numpy array. Let's check the mean of the new random values.

``````>>> newR.mean()
0.25549728059744525 # close enough
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download