ThePredator - 1 year ago 89

Python Question

I have data wherein I have a variable

`z`

Now I need to generate a random variable, call it

`random_z`

What I have tried so far is to generate a normal distribution centered at 1.0 so that I can remove all those above 1.0 to get a distribution that will be similar. I have been using

`numpy.random.normal`

Is there another way to go about generating this distribution in Python?

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

If you want to bootstrap you could use `random.choice()`

on your observed series.

Here I'll assume you'd like to smooth a bit more than that and you aren't concerned with generating new extreme values.

Use `pandas.Series.quantile()`

and a uniform [0,1] random number generator, as follows.

Training

- Put your random sample into a pandas Series, call this series
`S`

Production

- Generate a random number
`u`

between 0.0 and 1.0 the usual way, e.g.,`random.random()`

- return
`S.quantile(u)`

If you'd rather use `numpy`

than `pandas`

, from a quick reading it looks like you can substitute `numpy.percentile()`

in step 2.

Principle of Operation:

From the sample S, `pandas.series.quantile()`

or `numpy.percentile()`

is used to calculate the inverse cumulative distribution function for the method of Inverse transform sampling. The quantile or percentile function (relative to S) transforms a uniform [0,1] pseudo random number to a pseudo random number having the range and distribution of the sample S.

If you need to minimize coding and don't want to write and use functions that only returns a single realization, then it seems `numpy.percentile`

bests `pandas.Series.quantile`

.

Let S be a pre-existing sample.

u will be the new uniform random numbers

newR will be the new randoms drawn from a S-like distribution.

```
>>> import numpy as np
```

I need a sample of the kind of random numbers to be duplicated to put in `S`

.

For the purposes of creating an example, I am going to raise some uniform [0,1] random numbers to the third power and call that the sample `S`

. By choosing to generate the example sample in this way, I will know in advance -- from the mean being equal to the definite integral of (x^3)(dx) evaluated from 0 to 1 -- that the mean of S should be `1/(3+1)`

= `1/4`

= `0.25`

In your application, you would need to do something else instead, perhaps read a file, to
create a numpy array `S`

containing the data sample whose distribution is to be duplicated.

```
>>> S = pow(np.random.random(1000),3) # S will be 1000 samples of a power distribution
```

Here I will check that the mean of S is 0.25 as stated above.

```
>>> S.mean()
0.25296623781420458 # OK
```

get the min and max just to show how np.percentile works

```
>>> S.min()
6.1091277680105382e-10
>>> S.max()
0.99608676594692624
```

The numpy.percentile function maps 0-100 to the range of S.

```
>>> np.percentile(S,0) # this should match the min of S
6.1091277680105382e-10 # and it does
>>> np.percentile(S,100) # this should match the max of S
0.99608676594692624 # and it does
>>> np.percentile(S,[0,100]) # this should send back an array with both min, max
[6.1091277680105382e-10, 0.99608676594692624] # and it does
>>> np.percentile(S,np.array([0,100])) # but this doesn't....
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 2803, in percentile
if q == 0:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
```

This isn't so great if we generate 100 new values, starting with uniforms:

```
>>> u = np.random.random(100)
```

because it will error out, and the scale of u is 0-1, and 0-100 is needed.

This will work:

```
>>> newR = np.percentile(S, (100*u).tolist())
```

which works fine but might need its type adjusted if you want a numpy array back

```
>>> type(newR)
<type 'list'>
>>> newR = np.array(newR)
```

Now we have a numpy array. Let's check the mean of the new random values.

```
>>> newR.mean()
0.25549728059744525 # close enough
```

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**