Saul Garcia - 2 months ago 10

R Question

I am trying to estimate the constants for Heaps law.

I have the following dataset

`novels_colection`

`Number of novels DistinctWords WordOccurrences`

1 1 13575 117795

2 1 34224 947652

3 1 40353 1146953

4 1 55392 1661664

5 1 60656 1968274

Then I build the next function:

`# Function for Heaps law`

heaps <- function(K, n, B){

K*n^B

}

heaps(2,117795,.7) #Just to test it works

So n = Word Occurrences, and K and B are values that should be constants in order to find my prediction of Distinct Words.

I tried this but it gives me an error:

`fitHeaps <- nls(DistinctWords ~ heaps(K,WordOccurrences,B),`

data = novels_collection[,2:3],

start = list(K = .1, B = .1), trace = T)

Error =

`Error in numericDeriv(form[[3L]], names(ind), env) : `

Missing value or an infinity produced when evaluating the model

Any idea in how could I fix this or a method to fit the function and get the values for K and B?

Answer

If you take log transform on both sides of `y = K * n ^ B`

, you get `log(y) = log(K) + B * log(n)`

. This is a linear relationship between `log(y)`

and `log(n)`

, hence you can fit a linear regression model to find `log(K)`

and `B`

.

```
logy <- log(DistinctWords)
logn <- log(WordOccurrences)
fit <- lm(logy ~ logn)
para <- coef(fit) ## log(K) and B
para[1] <- exp(para[1]) ## K and B
```