R 多项式回归或组值和组间检验 + 结果解释

Question

我正在尝试模拟野生动物种群的疤痕形成率之间的关系，并且我之前已经计算过年率。

如果您看到下面的图，在我看来，利率在这段时期的中间上升，然后再次下降。我试图用代码

拟合多项式 LM

model1 <- lm(Rate~poly(year, 2, raw = TRUE),data=yearlyratesub) 
summary(model1)                         
model1

我使用以下方法绘图：

g <-ggplot(yearlyratesub, aes(year, Rate)) + geom_point(shape=1) +  geom_smooth(method = lm, formula = y ~ poly(x, 2, raw = TRUE))
g

模型输出为：

Call:
lm(formula = Rate ~ poly(year, 2, raw = TRUE), data = yearlyratesub)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.126332 -0.037683 -0.002602  0.053222  0.083503 

Coefficients:
                             Estimate Std. Error t value Pr(>|t|)  
(Intercept)                -8.796e+03  3.566e+03  -2.467   0.0297 *
poly(year, 2, raw = TRUE)1  8.747e+00  3.545e+00   2.467   0.0297 *
poly(year, 2, raw = TRUE)2 -2.174e-03  8.813e-04  -2.467   0.0297 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0666 on 12 degrees of freedom
Multiple R-squared:  0.3369,    Adjusted R-squared:  0.2264 
F-statistic: 3.048 on 2 and 12 DF,  p-value: 0.08503

现在我该如何输入？整体模型p值不显着但截距和单斜率显着？

我是否应该尝试另一种拟合而不是 x² 或什至对值进行分组并在组之间进行测试，例如用方差分析？我知道 LM 的拟合度较低，但我想这是因为我的值很小，也许 x² 可能不是...？

对于有关模型和结果解释的输入会很高兴..

Answer 1

分组

由于没有提供数据（下次请提供一个完整的可重现的问题，包括所有输入）我们最后使用了注释中的数据。如果我们使用指示的断点对点进行分组，我们会发现该模型非常重要。

g <- factor(findInterval(yearlyratesub$year, c(2007.5, 2014.5))+1); g
##  [1] 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3
## Levels: 1 2 3

fm <- lm(rate ~ g, yearlyratesub)
summary(fm)

给予

Call:
lm(formula = rate ~ g, data = yearlyratesub)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.064618 -0.018491  0.006091  0.029684  0.046831 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.110854   0.019694   5.629 0.000111 ***
g2           0.127783   0.024687   5.176 0.000231 ***
g3          -0.006714   0.027851  -0.241 0.813574    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.03939 on 12 degrees of freedom
Multiple R-squared:  0.7755,    Adjusted R-squared:  0.738 
F-statistic: 20.72 on 2 and 12 DF,  p-value: 0.0001281

我们可以考虑把外面的两组合并起来。

g2 <- factor(g == 2)
fm2 <- lm(rate ~ g2, yearlyratesub)
summary(fm2)

给予：

Call:
lm(formula = rate ~ g2, data = yearlyratesub)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.064618 -0.016813  0.007096  0.031363  0.046831 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.10750    0.01341   8.015 2.19e-06 ***
g2TRUE       0.13114    0.01963   6.680 1.52e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.03793 on 13 degrees of freedom
Multiple R-squared:  0.7744,    Adjusted R-squared:  0.757 
F-statistic: 44.62 on 1 and 13 DF,  p-value: 1.517e-05

正弦曲线

查看图表，似乎点在左右边缘出现上升，表明我们使用正弦曲线拟合。 a + b * cos(c * year)

fm3 <- nls(rate ~ cbind(a = 1, b = cos(c * year)), 
  yearlyratesub, start = list(c = 0.5), algorithm = "plinear")
summary(fm3)

给予

Formula: rate ~ cbind(a = 1, b = cos(c * year))

Parameters:
        Estimate Std. Error  t value Pr(>|t|)    
c      0.4999618  0.0001449 3449.654  < 2e-16 ***
.lin.a 0.1787200  0.0150659   11.863  5.5e-08 ***
.lin.b 0.0753754  0.0205818    3.662  0.00325 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.05688 on 12 degrees of freedom

Number of iterations to convergence: 2 
Achieved convergence tolerance: 5.241e-08

比较

绘制拟合图并查看它们的残差平方和和 AIC

plot(yearlyratesub)
# fm0 from Note at end, fm and fm2 are grouping models, fm3 is sinusoidal
L <- list(fm0 = fm0, fm = fm, fm2 = fm2, fm3 = fm3)
for(i in seq_along(L)) {
  lines(fitted(L[[i]]) ~ year, yearlyratesub, col = i, lwd = 2)
}
legend("topright", names(L), col = seq_along(L), lwd = 2)

给出以下残差平方和和 AIC（考虑到参数数量）更好的地方。我们看到 fm 基于残差平方和拟合得最紧密，但 fm2 拟合得几乎一样；但是，当使用 AIC 考虑参数数量时，fm2 具有最低的标准，因此最受该标准的青睐。

cbind(RSS = sapply(L, deviance), AIC = sapply(L, AIC))
##            RSS       AIC
## fm0 0.05488031 -33.59161
## fm  0.01861659 -49.80813
## fm2 0.01870674 -51.73567
## fm3 0.04024237 -38.24512

备注

yearlyratesub <- 
structure(list(year = c(2004, 2005, 2006, 2007, 2008, 2009, 2010, 
2011, 2012, 2013, 2014, 2015, 2017, 2018, 2019), rate = c(0.14099813521287, 
0.0949946651016247, 0.0904788394070601, 0.11694517831575, 0.26786193592875, 
0.256346628540479, 0.222029818828298, 0.180116679856725, 0.285467976459104, 
0.174019208113095, 0.28461698734932, 0.0574827955982996, 0.103378448084776, 
0.114593695172686, 0.141105952837639)), row.names = c(NA, -15L
), class = "data.frame")

fm0 <- lm(rate ~ poly(year, 2, raw = TRUE), yearlyratesub)
summary(fm0)

给予

Call:
lm(formula = rate ~ poly(year, 2, raw = TRUE), data = yearlyratesub)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.128335 -0.038289 -0.002715  0.054090  0.084792 

Coefficients:
                             Estimate Std. Error t value Pr(>|t|)  
(Intercept)                -8.930e+03  3.621e+03  -2.466   0.0297 *
poly(year, 2, raw = TRUE)1  8.880e+00  3.600e+00   2.467   0.0297 *
poly(year, 2, raw = TRUE)2 -2.207e-03  8.949e-04  -2.467   0.0297 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.06763 on 12 degrees of freedom
Multiple R-squared:  0.3381,    Adjusted R-squared:  0.2278 
F-statistic: 3.065 on 2 and 12 DF,  p-value: 0.0841

R 多项式回归或组值和组间检验 + 结果解释

R polynomal regression or group values and test between groups + outcome interpreatation

regression

model

r

lm

polynomials

分组

正弦曲线

比较

备注