`nls` 无法估计我模型的参数

Question

我正在尝试估算堆定律的常数。我有以下数据集 novels_colection:

  Number of novels DistinctWords WordOccurrences
1                1         13575          117795
2                1         34224          947652
3                1         40353         1146953
4                1         55392         1661664
5                1         60656         1968274

然后我构建下一个函数：

# Function for Heaps law
heaps <- function(K, n, B){
  K*n^B
}
heaps(2,117795,.7) #Just to test it works

所以 n = Word Occurrences、K 和 B 是应该是常量的值，以便找到我对 Distinct Words 的预测。

我试过了，但它给了我一个错误：

fitHeaps <- nls(DistinctWords ~ heaps(K,WordOccurrences,B), 
    data = novels_collection[,2:3], 
    start = list(K = .1, B = .1), trace = T)

错误=Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model

关于如何解决此问题或适合函数并获取 K 和 B 值的方法的任何想法？

Answer 1

如果在 y = K * n ^ B 的两边进行对数变换，您将得到 log(y) = log(K) + B * log(n)。这是 log(y) 和 log(n) 之间的线性关系，因此您可以拟合线性回归模型来找到 log(K) 和 B。

logy <- log(DistinctWords)
logn <- log(WordOccurrences)

fit <- lm(logy ~ logn)

para <- coef(fit)  ## log(K) and B
para[1] <- exp(para[1])    ## K and B

Answer 2

使用 minpack.lm 我们可以拟合一个非线性模型，但我想它比对数转换变量的线性模型更容易过度拟合（正如 Zheyuan 所做的那样），但我们可以比较线性/非线性模型在一些保留数据集上的残差以获得实证结果，这将很有趣。

library(minpack.lm)
fitHeaps = nlsLM(DistinctWords ~ heaps(K, WordOccurrences, B),
                     data = novels_collection[,2:3], 
                     start = list(K = .01, B = .01))
coef(fitHeaps)
#        K         B 
# 5.0452566 0.6472176 

plot(novels_collection$WordOccurrences, novels_collection$DistinctWords, pch=19)
lines(novels_collection$WordOccurrences, predict(fitHeaps, newdata = novels_collection[,2:3]), col='red')

`nls` 无法估计我模型的参数

`nls` fails to estimate parameters of my model

regression

r

curve-fitting

nls

non-linear-regression