I have data wherein I have a variable
If you want to bootstrap you could use
random.choice() on your observed series.
Here I'll assume you'd like to smooth a bit more than that and you aren't concerned with generating new extreme values.
pandas.Series.quantile() and a uniform [0,1] random number generator, as follows.
ubetween 0.0 and 1.0 the usual way, e.g.,
If you'd rather use
pandas, from a quick reading it looks like you can substitute
numpy.percentile() in step 2.
Principle of Operation:
From the sample S,
numpy.percentile() is used to calculate the inverse cumulative distribution function for the method of Inverse transform sampling. The quantile or percentile function (relative to S) transforms a uniform [0,1] pseudo random number to a pseudo random number having the range and distribution of the sample S.
If you need to minimize coding and don't want to write and use functions that only returns a single realization, then it seems
Let S be a pre-existing sample.
u will be the new uniform random numbers
newR will be the new randoms drawn from a S-like distribution.
>>> import numpy as np
I need a sample of the kind of random numbers to be duplicated to put in
For the purposes of creating an example, I am going to raise some uniform [0,1] random numbers to the third power and call that the sample
S. By choosing to generate the example sample in this way, I will know in advance -- from the mean being equal to the definite integral of (x^3)(dx) evaluated from 0 to 1 -- that the mean of S should be
In your application, you would need to do something else instead, perhaps read a file, to
create a numpy array
S containing the data sample whose distribution is to be duplicated.
>>> S = pow(np.random.random(1000),3) # S will be 1000 samples of a power distribution
Here I will check that the mean of S is 0.25 as stated above.
>>> S.mean() 0.25296623781420458 # OK
get the min and max just to show how np.percentile works
>>> S.min() 6.1091277680105382e-10 >>> S.max() 0.99608676594692624
The numpy.percentile function maps 0-100 to the range of S.
>>> np.percentile(S,0) # this should match the min of S 6.1091277680105382e-10 # and it does >>> np.percentile(S,100) # this should match the max of S 0.99608676594692624 # and it does >>> np.percentile(S,[0,100]) # this should send back an array with both min, max [6.1091277680105382e-10, 0.99608676594692624] # and it does >>> np.percentile(S,np.array([0,100])) # but this doesn't.... Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 2803, in percentile if q == 0: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This isn't so great if we generate 100 new values, starting with uniforms:
>>> u = np.random.random(100)
because it will error out, and the scale of u is 0-1, and 0-100 is needed.
This will work:
>>> newR = np.percentile(S, (100*u).tolist())
which works fine but might need its type adjusted if you want a numpy array back
>>> type(newR) <type 'list'> >>> newR = np.array(newR)
Now we have a numpy array. Let's check the mean of the new random values.
>>> newR.mean() 0.25549728059744525 # close enough