数据太长 R FlexmixNL 包中的错误

Data is too long Error in R FlexmixNL package

我试图在线搜索此内容,但无法准确找出我的问题所在。这是我的代码:

n = 10000
x1 <- runif(n,0,100) 
x2 <- runif(n,0,100) 
y1 <- 10*sin(x1/10) + 10 + rnorm(n, sd = 1)
y2 <- x2 * cos(x2) - 2 * rnorm(n, sd = 2)
x <- c(x1, x2)
y <- c(x1, x2)
start1 = list(a = 10, b = 5)
start2 = list(a = 30, b = 5)
library(flexmix)
library(flexmixNL)

modelNL <- flexmix(y~x, k =2, 
                   model = FLXMRnlm(formula = y ~ a*x/(b+x), 
                                    family = "gaussian", 
                                    start = list(start1, start2))) 

plot(x, y, col = clusters(modelNL))

在情节之前,它给了我这个错误:

Error in matrix(1, nrow = sum(groups$groupfirst)) : data is too long

我检查了 google 类似的错误,但我不太明白我自己的代码有什么问题导致了这个错误。

如你所知,我是 R 的新手,所以请尽可能用最通俗的语言解释一下。提前谢谢你。

具有讽刺意味的是(在错误消息的上下文中说数据“太长”)我认为该错误的近因是没有 data 参数。如果您以数据帧的形式给它 data ,您仍然会收到错误,但它与您遇到的错误不同。当您绘制数据时,至少从统计分布的角度来看,您会得到一组相当奇怪的值,并且不清楚您为什么要尝试使用此公式对其进行建模。尽管如此,有了这些起始值和数据的数据框参数,人们还是看到了结果。

> modelNL <- flexmix(y~x, k =2,  data=data.frame(x=x,y=y),
+                    model = FLXMRnlm(formula = y ~ a*x/(b+x), 
+                                     family = "gaussian", 
+                                     start = list(start1, start2)))
> modelNL

Call:
flexmix(formula = y ~ x, data = data.frame(x = x, y = y), k = 2, model = FLXMRnlm(formula = y ~ 
    a * x/(b + x), family = "gaussian", start = list(start1, start2)))

Cluster sizes:
    1     2 
 6664 13336 

convergence after 20 iterations
> summary(modelNL)

Call:
flexmix(formula = y ~ x, data = data.frame(x = x, y = y), k = 2, model = FLXMRnlm(formula = y ~ 
    a * x/(b + x), family = "gaussian", start = list(start1, start2)))

       prior  size post>0 ratio
Comp.1 0.436  6664  20000 0.333
Comp.2 0.564 13336  16306 0.818

'log Lik.' -91417.03 (df=7)
AIC: 182848.1   BIC: 182903.4 

大多数 R 回归函数首先检查 data= 参数中公式中的匹配名称。显然这个函数在需要到全局环境中去匹配公式标记时失败了。

我尝试了数据图建议的公式并得到收敛结果:

> modelNL <- flexmix(y~x, k =2,  data=data.frame(x=x,y=y),
+                    model = FLXMRnlm(formula = y ~ a*x*cos(x+b), 
+                                     family = "gaussian", 
+                                     start = list(start1, start2)))
> modelNL

Call:
flexmix(formula = y ~ x, data = data.frame(x = x, y = y), k = 2, model = FLXMRnlm(formula = y ~ 
    a * x * cos(x + b), family = "gaussian", start = list(start1, start2)))

Cluster sizes:
    1     2 
 9395 10605 

convergence after 17 iterations
> summary(modelNL)

Call:
flexmix(formula = y ~ x, data = data.frame(x = x, y = y), k = 2, model = FLXMRnlm(formula = y ~ 
    a * x * cos(x + b), family = "gaussian", start = list(start1, start2)))

       prior  size post>0 ratio
Comp.1 0.521  9395  18009 0.522
Comp.2 0.479 10605  13378 0.793

'log Lik.' -78659.85 (df=7)
AIC: 157333.7   BIC: 157389 

与第一个公式相比,AIC 的减少似乎很大。