andy - 1 year ago 83

Python Question

Say I have a 10,000 pt vector that I want to take a slice of only 100 logarithmically spaced points. I want a function to give me integer values for the indices. Here's a simple solution that is simply using around + logspace, then getting rid of duplicates.

`def genLogSpace( array_size, num ):`

lspace = around(logspace(0,log10(array_size),num)).astype(uint64)

return array(sorted(set(lspace.tolist())))-1

ls=genLogspace(1e4,100)

print ls.size

>>84

print ls

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,

11, 13, 14, 15, 17, 19, 21, 23, 25, 27, 30,

33, 37, 40, 44, 49, 54, 59, 65, 71, 78, 86,

94, 104, 114, 125, 137, 151, 166, 182, 200, 220, 241,

265, 291, 319, 350, 384, 422, 463, 508, 558, 613, 672,

738, 810, 889, 976, 1071, 1176, 1291, 1416, 1555, 1706, 1873,

2056, 2256, 2476, 2718, 2983, 3274, 3593, 3943, 4328, 4750, 5213,

5721, 6279, 6892, 7564, 8301, 9111, 9999], dtype=uint64)

Notice that there were 16 duplicates, so now I only have 84 points.

Does anyone have a solution that will efficiently ensure the number of output samples is num? For this specific example, input values for num of 121 and 122 give 100 output points.

Answer Source

This is a bit tricky. You can't always get logarithmically spaced numbers. As in your example, first part is rather linear. If you are OK with that, I have a solution. But for the solution, you should understand why you have duplicates.

Logarithmic scale satisfies the condition:

```
s[n+1]/s[n] = constant
```

Let's call this constant `r`

for `ratio`

. For `n`

of these numbers between range `1...size`

, you'll get:

```
1, r, r**2, r**3, ..., r**(n-1)=size
```

So this gives you:

```
r = size ** (1/(n-1))
```

In your case, `n=100`

and `size=10000`

, `r`

will be `~1.0974987654930561`

, which means, if you start with `1`

, your next number will be `1.0974987654930561`

which is then rounded to `1`

again. Thus your duplicates. This issue is present for small numbers. After a sufficiently large number, multiplying with ratio will result in a different rounded integer.

Keeping this in mind, your best bet is to add consecutive integers up to a certain point so that this multiplication with the ratio is no longer an issue. Then you can continue with the logarithmic scaling. The following function does that:

```
import numpy as np
def gen_log_space(limit, n):
result = [1]
if n>1: # just a check to avoid ZeroDivisionError
ratio = (float(limit)/result[-1]) ** (1.0/(n-len(result)))
while len(result)<n:
next_value = result[-1]*ratio
if next_value - result[-1] >= 1:
# safe zone. next_value will be a different integer
result.append(next_value)
else:
# problem! same integer. we need to find next_value by artificially incrementing previous value
result.append(result[-1]+1)
# recalculate the ratio so that the remaining values will scale correctly
ratio = (float(limit)/result[-1]) ** (1.0/(n-len(result)))
# round, re-adjust to 0 indexing (i.e. minus 1) and return np.uint64 array
return np.array(map(lambda x: round(x)-1, result), dtype=np.uint64)
```

Here are some examples using it:

```
In [157]: x = gen_log_space(10000, 100)
In [158]: x.size
Out[158]: 100
In [159]: len(set(x))
Out[159]: 100
In [160]: y = gen_log_space(2000, 50)
In [161]: y.size
Out[161]: 50
In [162]: len(set(y))
Out[162]: 50
In [163]: y
Out[163]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11,
13, 14, 17, 19, 22, 25, 29, 33, 38, 43, 49,
56, 65, 74, 84, 96, 110, 125, 143, 164, 187, 213,
243, 277, 316, 361, 412, 470, 536, 612, 698, 796, 908,
1035, 1181, 1347, 1537, 1753, 1999], dtype=uint64)
```

And just to show you how logarithmic the results are, here is a semilog plot of the output for `x = gen_log_scale(10000, 100)`

(as you can see, left part is not really logarithmic):