Gam 是 R 返回 "degrees of freedom" 或 "subscript out of bounds" 错误
Gam is R returning either "degrees of freedom" or "subscript out of bounds" error
我正在尝试 运行 小型数据集上的 gam 模型
library(ggplot2)
library(mgcv)
test <- structure(list(x = c(69, 365, 452, 100, 120, 120, 150, 159, 180
), y = c(17.91, 2.58, 4.82, 10.09, 6.24, 10.33, 2.35, 1.94, 3.91
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-9L))
这个数据看起来像这样
ggplot(test, aes(x = x, y = y)) + geom_point()
我想尝试用 gam 样条拟合这些数据。我尝试了以下,returns 一个错误
mod <- gam(test$y ~ s(test$x))
Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
A term has fewer unique covariate combinations than specified maximum
degrees of freedom
基于此 post http://r.789695.n4.nabble.com/Help-with-GAM-mgcv-td3074165.html 看来我可能需要强制减少结数。
我试着告诉模型只使用两个结,如下所示(我想我以后会以某种方式找出最佳数量):
mod <- gam(test$y ~ s(test$x), k = 2)
Error in data[[txt]] : subscript out of bounds
我不确定后面的错误是什么意思,或者我为什么会收到它。
为了以防万一,这里是该错误的回溯
traceback()
7: get.var(object$term[i], knots)
6: ExtractData(object, data, knots)
5: smooth.construct3(object, data, knots)
4: smoothCon(split$smooth.spec[[i]], data, knots, absorb.cons, scale.penalty = scale.penalty,
null.space.penalty = select, sparse.cons = sparse.cons, diagonal.penalty = diagonal.penalty,
apply.by = apply.by, modCon = modCon)
3: gam.setup(formula = list(pf = test$y ~ 1, pfok = 1, smooth.spec = list(
list(term = "test$x", bs.dim = -1, fixed = FALSE, dim = 1L,
p.order = NA, by = "NA", label = "s(test$x)", xt = NULL,
id = NULL, sp = NULL)), fake.formula = test$y ~ 1 + test$x,
response = "test$y", fake.names = "test$x", pred.names = c("test",
"x"), pred.formula = ~test + x), pterms = test$y ~ 1, data = list(
`test$y` = c(17.91, 2.58, 4.82, 10.09, 6.24, 10.33, 2.35,
1.94, 3.91), `test$x` = c(69, 365, 452, 100, 120, 120, 150,
159, 180)), knots = 2, sp = NULL, min.sp = NULL, H = NULL,
absorb.cons = TRUE, sparse.cons = 0, select = FALSE, idLinksBases = TRUE,
scale.penalty = TRUE, paraPen = NULL, drop.intercept = FALSE)
2: do.call(gsname, list(formula = gp, pterms = pterms, data = mf,
knots = knots, sp = sp, min.sp = min.sp, H = H, absorb.cons = TRUE,
sparse.cons = 0, select = select, idLinksBases = control$idLinksBases,
scale.penalty = control$scalePenalty, paraPen = paraPen,
drop.intercept = drop.intercept))
1: gam(test$y ~ s(test$x), k = 2)
和会话信息
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mgcv_1.8-25 nlme_3.1-137 ggplot2_3.1.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.19 rstudioapi_0.8 bindr_0.1.1 knitr_1.20 magrittr_1.5 tidyselect_0.2.5 munsell_0.5.0
[8] lattice_0.20-38 colorspace_1.3-2 R6_2.3.0 rlang_0.3.0.1 plyr_1.8.4 dplyr_0.7.7 tools_3.5.1
[15] grid_3.5.1 packrat_0.4.9-3 gtable_0.2.0 withr_2.1.2 yaml_2.2.0 lazyeval_0.2.1 assertthat_0.2.0
[22] tibble_1.4.2 crayon_1.3.4 Matrix_1.2-15 bindrcpp_0.2.2 purrr_0.2.5 glue_1.3.0 labeling_0.3
[29] compiler_3.5.1 pillar_1.3.0 scales_1.0.0 pkgconfig_2.0.2
我想知道是否有人对我如何最好地处理此错误或以其他方式使用此数据获得工作游戏提出建议。感谢您的想法。
如果你使用 k=3 没有问题(至少我使用的方法将数据帧传递给 data
参数>
> mod <- gam(y ~ s(x, k=3), data=test)
> mod
Family: gaussian
Link function: identity
Formula:
y ~ s(x, k = 3)
Estimated degrees of freedom:
1.95 total = 2.95
GCV score: 9.922357
> mod <- gam(y ~ s(x, k=4), data=test)
>
> mod
Family: gaussian
Link function: identity
Formula:
y ~ s(x, k = 4)
Estimated degrees of freedom:
2.67 total = 3.67
GCV score: 7.77551
> mod <- gam(y ~ s(x, k=5), data=test)
> mod
Family: gaussian
Link function: identity
Formula:
y ~ s(x, k = 5)
Estimated degrees of freedom:
2.94 total = 3.94
GCV score: 7.758121
由于 GCV 分数在从 4 增加到 5 时不会增加,所以我展示了 k=4 时你得到的结果:
> mod <- gam(y ~ s(x, k=4), data=test)
> png()
> plot(mod)
> points(y~x, data=test)
> dev.off()
RStudioGD
2
我很确定 s(x,2.67)
从点偏移的原因是没有包括截距项,这将是另一个(未绘制的)"degree of freedom"。如果您想要完整模型的预测(它将通过数据点,而不仅仅是平滑项的贡献,则针对同一序列绘制 predict(mod, newdata=list( x= seq(min(x),max(x), length=100)))
。
我正在尝试 运行 小型数据集上的 gam 模型
library(ggplot2)
library(mgcv)
test <- structure(list(x = c(69, 365, 452, 100, 120, 120, 150, 159, 180
), y = c(17.91, 2.58, 4.82, 10.09, 6.24, 10.33, 2.35, 1.94, 3.91
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-9L))
这个数据看起来像这样
ggplot(test, aes(x = x, y = y)) + geom_point()
我想尝试用 gam 样条拟合这些数据。我尝试了以下,returns 一个错误
mod <- gam(test$y ~ s(test$x))
Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : A term has fewer unique covariate combinations than specified maximum degrees of freedom
基于此 post http://r.789695.n4.nabble.com/Help-with-GAM-mgcv-td3074165.html 看来我可能需要强制减少结数。
我试着告诉模型只使用两个结,如下所示(我想我以后会以某种方式找出最佳数量):
mod <- gam(test$y ~ s(test$x), k = 2)
Error in data[[txt]] : subscript out of bounds
我不确定后面的错误是什么意思,或者我为什么会收到它。
为了以防万一,这里是该错误的回溯
traceback()
7: get.var(object$term[i], knots) 6: ExtractData(object, data, knots) 5: smooth.construct3(object, data, knots) 4: smoothCon(split$smooth.spec[[i]], data, knots, absorb.cons, scale.penalty = scale.penalty, null.space.penalty = select, sparse.cons = sparse.cons, diagonal.penalty = diagonal.penalty, apply.by = apply.by, modCon = modCon) 3: gam.setup(formula = list(pf = test$y ~ 1, pfok = 1, smooth.spec = list( list(term = "test$x", bs.dim = -1, fixed = FALSE, dim = 1L, p.order = NA, by = "NA", label = "s(test$x)", xt = NULL, id = NULL, sp = NULL)), fake.formula = test$y ~ 1 + test$x, response = "test$y", fake.names = "test$x", pred.names = c("test", "x"), pred.formula = ~test + x), pterms = test$y ~ 1, data = list( `test$y` = c(17.91, 2.58, 4.82, 10.09, 6.24, 10.33, 2.35, 1.94, 3.91), `test$x` = c(69, 365, 452, 100, 120, 120, 150, 159, 180)), knots = 2, sp = NULL, min.sp = NULL, H = NULL, absorb.cons = TRUE, sparse.cons = 0, select = FALSE, idLinksBases = TRUE, scale.penalty = TRUE, paraPen = NULL, drop.intercept = FALSE) 2: do.call(gsname, list(formula = gp, pterms = pterms, data = mf, knots = knots, sp = sp, min.sp = min.sp, H = H, absorb.cons = TRUE, sparse.cons = 0, select = select, idLinksBases = control$idLinksBases, scale.penalty = control$scalePenalty, paraPen = paraPen, drop.intercept = drop.intercept)) 1: gam(test$y ~ s(test$x), k = 2)
和会话信息
sessionInfo()
R version 3.5.1 (2018-07-02) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.1 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] mgcv_1.8-25 nlme_3.1-137 ggplot2_3.1.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.19 rstudioapi_0.8 bindr_0.1.1 knitr_1.20 magrittr_1.5 tidyselect_0.2.5 munsell_0.5.0 [8] lattice_0.20-38 colorspace_1.3-2 R6_2.3.0 rlang_0.3.0.1 plyr_1.8.4 dplyr_0.7.7 tools_3.5.1 [15] grid_3.5.1 packrat_0.4.9-3 gtable_0.2.0 withr_2.1.2 yaml_2.2.0 lazyeval_0.2.1 assertthat_0.2.0 [22] tibble_1.4.2 crayon_1.3.4 Matrix_1.2-15 bindrcpp_0.2.2 purrr_0.2.5 glue_1.3.0 labeling_0.3 [29] compiler_3.5.1 pillar_1.3.0 scales_1.0.0 pkgconfig_2.0.2
我想知道是否有人对我如何最好地处理此错误或以其他方式使用此数据获得工作游戏提出建议。感谢您的想法。
如果你使用 k=3 没有问题(至少我使用的方法将数据帧传递给 data
参数>
> mod <- gam(y ~ s(x, k=3), data=test)
> mod
Family: gaussian
Link function: identity
Formula:
y ~ s(x, k = 3)
Estimated degrees of freedom:
1.95 total = 2.95
GCV score: 9.922357
> mod <- gam(y ~ s(x, k=4), data=test)
>
> mod
Family: gaussian
Link function: identity
Formula:
y ~ s(x, k = 4)
Estimated degrees of freedom:
2.67 total = 3.67
GCV score: 7.77551
> mod <- gam(y ~ s(x, k=5), data=test)
> mod
Family: gaussian
Link function: identity
Formula:
y ~ s(x, k = 5)
Estimated degrees of freedom:
2.94 total = 3.94
GCV score: 7.758121
由于 GCV 分数在从 4 增加到 5 时不会增加,所以我展示了 k=4 时你得到的结果:
> mod <- gam(y ~ s(x, k=4), data=test)
> png()
> plot(mod)
> points(y~x, data=test)
> dev.off()
RStudioGD
2
我很确定 s(x,2.67)
从点偏移的原因是没有包括截距项,这将是另一个(未绘制的)"degree of freedom"。如果您想要完整模型的预测(它将通过数据点,而不仅仅是平滑项的贡献,则针对同一序列绘制 predict(mod, newdata=list( x= seq(min(x),max(x), length=100)))
。