nlsList - 用法不正确?

nlsList -incorrect usage?

我正在尝试 运行 数据集的非线性回归,因此我想 运行 每个组的新回归。数据框很像这个:

Date <- as.POSIXct(c("2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25"))
Ts <- rnorm(25, mean=10, sd=0.5)
Exp_flux <- 3.5*exp((Ts-10)/10)
Collar <- as.factor(c("t1","t2","t3","t4","t5","t1","t2","t3","t4","t5","t1","t2","t3","t4",
"t5","t1","t2","t3","t4","t5","t1","t2","t3","t4","t5"))
df <- data.frame(Date,Collar,Ts,Exp_flux)

df
         Date Collar        Ts Exp_flux
1  2021-05-25     t1  9.596453 3.361570
2  2021-05-20     t2  8.870983 3.126334
3  2021-05-21     t3 10.011902 3.504168
4  2021-05-22     t4 10.480873 3.672418
5  2021-05-23     t5 10.264998 3.593989
6  2021-05-24     t1 10.196256 3.569368
7  2021-05-25     t2  9.523135 3.337014
8  2021-05-20     t3 10.315953 3.612349
9  2021-05-21     t4  9.510503 3.332801
10 2021-05-22     t5 10.300981 3.606945
11 2021-05-23     t1 10.788605 3.787187
12 2021-05-24     t2 10.226902 3.580323
13 2021-05-25     t3  9.005530 3.168683
14 2021-05-20     t4 10.752006 3.773351
15 2021-05-21     t5  9.335704 3.275051
16 2021-05-22     t1  9.345418 3.278234
17 2021-05-23     t2 10.034693 3.512164
18 2021-05-24     t3 10.754786 3.774401
19 2021-05-25     t4  9.655313 3.381415
20 2021-05-20     t5 10.670903 3.742872
21 2021-05-21     t1  8.986950 3.162801
22 2021-05-22     t2 10.441217 3.657883
23 2021-05-23     t3 10.446326 3.659753
24 2021-05-24     t4 10.550104 3.697931
25 2021-05-25     t5 10.442247 3.658260

我的目标是 运行 对每种衣领类型进行 Exp_flux 与 Ts 的单独回归。我知道我可以将主要数据集分成每个项圈的子集并手动执行每个回归,但实际上有 20 多种项圈类型,我认为必须有更有效的方法来执行此操作。我试过使用 nlme 包的 nlsList 函数,它只给出一个空列表或者(在以前的情况下)只给出第一个项圈的回归:

fit.collars <- nlsList(Exp_Flux ~ SRref*q^((Ts-10)/10)| Collar,
                               data=df,  start=list(SRref=3, q=2), na.action = na.omit )
summary(fit.collars)

Error in class(val) <- c("summary.nlsList", class(val)) : 
  attempt to set an attribute on NULL

我一定是错误地使用了 nlsList 函数,但我不知道为什么会这样。关于此功能的教程在网上非常少。谁能就此或相对简单的替代方案提出建议?

有几个问题:

    公式中的
  1. Exp_Flux 不同于 Exp_flux 作为列名
  2. 该问题使用的随机数没有 set.seed,因此数据不可重现。我们使用了末尾注释中显示的数据以实现可重复性。
  3. 可能需要更好的起始值。使用末尾注释中的数据,问题中的起始值按原样工作,但由于数据不可重现,我们添加了更好的起始值,以防它们不在实际数据中。
  4. 添加 control = nls.control(scaleOffset = 1) 参数来处理零残差。请注意,scaleOffset 是在 R 4.1.2 中引入的,在早期版本的 R 中不可用。

代码--

library(nlme)

# get starting values
fit0 <- lm(log(Exp_flux) ~ I((Ts-10)/10), df)

st <- setNames(exp(coef(fit0)), c("SRref", "q"))
fo2 <-  Exp_flux ~ SRref * q^((Ts-10)/10) | Collar
fit2 <- nlsList(fo2, data=df,  start = st, na.action = na.omit,
 control = nls.control(scaleOffset = .1))
fit2

给予:

Call:
  Model: Exp_flux ~ SRref * q^((Ts - 10)/10) | Collar 
   Data: df 

Coefficients:
   SRref        q
t1   3.5 2.718282
t2   3.5 2.718282
t3   3.5 2.718282
t4   3.5 2.718282
t5   3.5 2.718282

Degrees of freedom: 25 total; 15 residual
Residual standard error: 1.152352e-15

分组与未分组

请注意,对于此数据,按 Collar 分组并不重要。我们已经可以从系数相同观察到,但如果实际数据不是这种情况,这就是如何使用方差分析执行测试。

# ungrouped
fo3 <- Exp_flux ~ SRref * q^((Ts-10)/10)
fit3 <- nls(fo3, data = df, start = st,
  na.action = na.omit,  control = list(scaleOffset = 1))

# grouped
fo4 <-  Exp_flux ~ SRref[Collar] * q[Collar]^((Ts-10)/10)
fit4 <- nls(fo4, data = df, 
  start = list(SRref = rep(st[[1]], 5), q = rep(st[[2]], 5)),
  na.action = na.omit,  control = list(scaleOffset = 1))

anova(fit3, fit4)

给予:

Analysis of Variance Table

Model 1: Exp_flux ~ SRref * q^((Ts - 10)/10)
Model 2: Exp_flux ~ SRref[Collar] * q[Collar]^((Ts - 10)/10)
  Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
1     23 1.9919e-29                         
2     15 1.9919e-29  8      0       0      1

线性模型

请注意,简单的线性模型非常适合数据。

plot(Exp_flux ~ Ts, df, col = df$Collar)
fm0 <- lm(Exp_flux ~ Ts, df)
abline(fm0)

备注

我们使用了这些数据:

set.seed(123)
Date <- as.POSIXct(c("2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25"))
Ts <- rnorm(25, mean=10, sd=0.5)
Exp_flux <- 3.5*exp((Ts-10)/10)
Collar <- as.factor(c("t1","t2","t3","t4","t5","t1","t2","t3","t4","t5","t1","t2","t3","t4",
"t5","t1","t2","t3","t4","t5","t1","t2","t3","t4","t5"))
df <- data.frame(Date,Collar,Ts,Exp_flux)

更新 3

从该答案的先前版本中的问题复制公式时出错,并进行了更改以使其起作用。现在已经修复了这些,它现在可以使用 (1) 改进起始值,这可能需要也可能不需要,以及 (2) 添加 scaleOffset 参数。 @Roland 指出模型错误,模型的残差为零。

还添加了有关比较分组与未分组以及使用简单线性模型的部分。

让我引用 help("nls"):

The default settings of nls generally fail on artificial “zero-residual” data problems.

如果我添加一些白噪声并修正拼写错误,我就能成功拟合。

set.seed(42)

Date <- as.POSIXct(c("2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
                     "2021-05-23","2021-05-24" ,"2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
                     "2021-05-23","2021-05-24" ,"2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
                     "2021-05-23","2021-05-24" ,"2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
                     "2021-05-23","2021-05-24" ,"2021-05-25"))
Ts <- rnorm(25, mean=10, sd=0.5)
Exp_flux <- 3.5*exp((Ts-10)/10) + rnorm(25, sd = 0.01)
Collar <- as.factor(c("t1","t2","t3","t4","t5","t1","t2","t3","t4","t5","t1","t2","t3","t4",
                      "t5","t1","t2","t3","t4","t5","t1","t2","t3","t4","t5"))
df <- data.frame(Date,Collar,Ts,Exp_flux)

library(nlme)
fit.collars <- nlsList(Exp_flux ~ SRref*q^((Ts-10)/10)| Collar,
                       data=df,  start=list(SRref=3, q=2), na.action = na.omit )
summary(fit.collars)
#works

如果你真的想要合并残差标准误差,请慎重考虑。