Apsed1976 - 4 months ago 22

C# Question

I have a number of some

`N`

What I'd like to is estimate the mathematical function for given outputs and inputs. Is it closer to the form of

`f(N)`

`f(N^2)`

`log(N)`

`N*lolg(N)`

`2^N`

Essentially, what I'd like to do is estimate big O. Thus, n is amount of input data, and the outputs is compute time. So basically I'd like to at least know to which of above mentioned functioned mine function is closer.

Answer

You can use the Least Squares method to find the function that has the least distance to your data.

Assuming you have a list of sample observations of your unknown function in the shape of some order pairs `(x,y)`

or `(x,f(x))`

. You can measure the distance of this unknown function to some known function *g* using Least Squares method.

```
distance = 0
for x,y in sample pairs
distance += ( y - g(x) )^2
```

As much as this *distance* gets smaller your unknown function is closer to the known function *g*.

Now If you want to find the closest function (from a pre-determined list of functions) to your unknown function, you just have to calculate each of those functions distance to your unknown function. Whichever had the least distance is more similar to your unknown function.

*Note that this method is an approximation and it isn't 100% accurate but you can increase its accuracy by providing a larger and more comprehensive sample data*

Here is a sample Python implementation:

```
import math
functions = []
functions.append( lambda n: n ) # y = n
functions.append( lambda n: n*n ) # y = n^2
functions.append( lambda n: math.log(n,2) ) # y = log(n)
functions.append( lambda n: n*math.log(n,2) ) # y = n*log(n)
functions.append( lambda n: 2**n ) # y = 2^n
# creating the sample data the unknown function is n + 10*n*log(n)
pairs = [ (n, n + 10*n*math.log(n,2) ) for n in range(1,200,1) ]
# calculating the distance of each function to the sample data
# and add them to a list
distances = [ ]
for func in functions:
d = 0
for n,y in pairs:
d += (func(n) - y)**2
distances.append(d)
# finding the minimum value and printing the index
print distances.index(min(distances))
```

Output

```
3
```

It means the 4th function is closest to our sample data which is `n*log(n)`

.

Note that if we reduce the size of the sample data like this (getting rid of half the sample data):

```
pairs = [ (n, n + 10*n*math.log(n,2) ) for n in range(1,100,1) ]
```

This program will print `1`

which means that the closest function is n^{2}. This shows the importance of the sample data.