Saul Garcia Saul Garcia - 2 months ago 12
R Question

`nls` fails to estimate parameters of my model

I am trying to estimate the constants for Heaps law.
I have the following dataset

novels_colection
:

Number of novels DistinctWords WordOccurrences
1 1 13575 117795
2 1 34224 947652
3 1 40353 1146953
4 1 55392 1661664
5 1 60656 1968274


Then I build the next function:

# Function for Heaps law
heaps <- function(K, n, B){
K*n^B
}
heaps(2,117795,.7) #Just to test it works


So
n = Word Occurrences
, and
K
and
B
are values that should be constants in order to find my prediction of Distinct Words.

I tried this but it gives me an error:

fitHeaps <- nls(DistinctWords ~ heaps(K,WordOccurrences,B),
data = novels_collection[,2:3],
start = list(K = .1, B = .1), trace = T)


Error =
Error in numericDeriv(form[[3L]], names(ind), env) :
Missing value or an infinity produced when evaluating the model


Any idea in how could I fix this or a method to fit the function and get the values for
K
and
B
?

Answer

If you take log transform on both sides of y = K * n ^ B, you get log(y) = log(K) + B * log(n). This is a linear relationship between log(y) and log(n), hence you can fit a linear regression model to find log(K) and B.

logy <- log(DistinctWords)
logn <- log(WordOccurrences)

fit <- lm(logy ~ logn)

para <- coef(fit)  ## log(K) and B
para[1] <- exp(para[1])    ## K and B
Comments