选择分段回归中的断点数
Selecting the number of breakpoints in segmented regression
我正在尝试为响应变量 Y 估计 X 中的多个断点。当我 运行 R 中的分段包时,如果我在 psi 中指定 1 个点,我会在 x=14 处得到 1 个估计断点如果我在 psi 中指定 2 个点,则语句和 x=6.5 和 x=11.4 处的两个估计点。如何确定 2 个断点是最优的还是 1 个断点是最优的?请查看下面的代码和输出:
指定 1 个断点:
segmented.glm(obj = fit.glm, seg.Z = ~x, psi = 10)
Estimated Break-Point(s):
Est. St.Err
psi1.x 14 2.691
Null deviance: 230311 on 1509 degrees of freedom
Residual deviance: 175795 on 1480 degrees of freedom
AIC: 11531
Convergence attained in 0 iter. (rel. change 1.5525e-08)
> slope(fit.seg)
$x
Est. St.Err. t value CI(95%).l CI(95%).u
slope1 -0.847880 0.097683 -8.679900 -1.0393 -0.65643
slope2 0.036962 0.574770 0.064308 -1.0896 1.16350
指定 2 个断点:
fit.seg<-segmented(fit.glm, seg.Z=~x, psi= c(6, 11))
Estimated Break-Point(s):
Est. St.Err
psi1.x 6.562 1.771
psi2.x 11.398 1.660
Null deviance: 230311 on 1509 degrees of freedom
Residual deviance: 175594 on 1478 degrees of freedom
AIC: 11533
Convergence attained in 1 iter. (rel. change 0)
> slope(fit.seg)
$x
Est. St.Err. t value CI(95%).l CI(95%).u
slope1 -0.56943 0.23681 -2.40460 -1.03360 -0.10530
slope2 -1.25180 0.38974 -3.21190 -2.01570 -0.48794
slope3 -0.17365 0.31700 -0.54781 -0.79495 0.44765
我使用了 seg.control 但不知道如何解释输出。 (基于 Muggeo,V.M.R。(2008)分段:一个 R 包,用于拟合具有折线关系的回归模型。R 新闻 8/1,20-25。)
> o <- segmented(fit.glm, seg.Z=~x, psi=NA, control=seg.control(display=FALSE, K=2))
Warning message:
max number of iterations (1) attained
> slope(o) # defaults to confidence level of 0.95 (conf.level=0.95)
$x
Est. St.Err. t value CI(95%).l CI(95%).u
slope1 -0.56943 0.23681 -2.40460 -1.03360 -0.10530
slope2 -1.25180 0.38974 -3.21190 -2.01570 -0.48794
slope3 -0.17365 0.31700 -0.54781 -0.79495 0.44765
> o <- segmented(fit.glm, seg.Z=~x, psi=NA, control=seg.control(display=FALSE, K=1))
Warning messages:
1: max number of iterations (1) attained
2: max number of iterations (1) attained
> slope(o) # defaults to confidence level of 0.95 (conf.level=0.95)
$x
Est. St.Err. t value CI(95%).l CI(95%).u
slope1 -0.847880 0.097683 -8.679900 -1.0393 -0.65643
slope2 0.036966 0.574770 0.064314 -1.0896 1.16350
任何人都可以帮我弄清楚如何确定 2 个断点是更好的估计值还是 1 个断点?
函数 selgmented()(也在 R 包 segmented 中)是 select 通过假设检验(例如分数检验)或 BIC 的“最佳”断点数的包装器。目前 selection 通过假设检验仅限于 0,1 或 2 个断点 selected。
亲切的问候,
维托
我正在尝试为响应变量 Y 估计 X 中的多个断点。当我 运行 R 中的分段包时,如果我在 psi 中指定 1 个点,我会在 x=14 处得到 1 个估计断点如果我在 psi 中指定 2 个点,则语句和 x=6.5 和 x=11.4 处的两个估计点。如何确定 2 个断点是最优的还是 1 个断点是最优的?请查看下面的代码和输出:
指定 1 个断点:
segmented.glm(obj = fit.glm, seg.Z = ~x, psi = 10)
Estimated Break-Point(s):
Est. St.Err
psi1.x 14 2.691
Null deviance: 230311 on 1509 degrees of freedom
Residual deviance: 175795 on 1480 degrees of freedom
AIC: 11531
Convergence attained in 0 iter. (rel. change 1.5525e-08)
> slope(fit.seg)
$x
Est. St.Err. t value CI(95%).l CI(95%).u
slope1 -0.847880 0.097683 -8.679900 -1.0393 -0.65643
slope2 0.036962 0.574770 0.064308 -1.0896 1.16350
指定 2 个断点:
fit.seg<-segmented(fit.glm, seg.Z=~x, psi= c(6, 11))
Estimated Break-Point(s):
Est. St.Err
psi1.x 6.562 1.771
psi2.x 11.398 1.660
Null deviance: 230311 on 1509 degrees of freedom
Residual deviance: 175594 on 1478 degrees of freedom
AIC: 11533
Convergence attained in 1 iter. (rel. change 0)
> slope(fit.seg)
$x
Est. St.Err. t value CI(95%).l CI(95%).u
slope1 -0.56943 0.23681 -2.40460 -1.03360 -0.10530
slope2 -1.25180 0.38974 -3.21190 -2.01570 -0.48794
slope3 -0.17365 0.31700 -0.54781 -0.79495 0.44765
我使用了 seg.control 但不知道如何解释输出。 (基于 Muggeo,V.M.R。(2008)分段:一个 R 包,用于拟合具有折线关系的回归模型。R 新闻 8/1,20-25。)
> o <- segmented(fit.glm, seg.Z=~x, psi=NA, control=seg.control(display=FALSE, K=2))
Warning message:
max number of iterations (1) attained
> slope(o) # defaults to confidence level of 0.95 (conf.level=0.95)
$x
Est. St.Err. t value CI(95%).l CI(95%).u
slope1 -0.56943 0.23681 -2.40460 -1.03360 -0.10530
slope2 -1.25180 0.38974 -3.21190 -2.01570 -0.48794
slope3 -0.17365 0.31700 -0.54781 -0.79495 0.44765
> o <- segmented(fit.glm, seg.Z=~x, psi=NA, control=seg.control(display=FALSE, K=1))
Warning messages:
1: max number of iterations (1) attained
2: max number of iterations (1) attained
> slope(o) # defaults to confidence level of 0.95 (conf.level=0.95)
$x
Est. St.Err. t value CI(95%).l CI(95%).u
slope1 -0.847880 0.097683 -8.679900 -1.0393 -0.65643
slope2 0.036966 0.574770 0.064314 -1.0896 1.16350
任何人都可以帮我弄清楚如何确定 2 个断点是更好的估计值还是 1 个断点?
函数 selgmented()(也在 R 包 segmented 中)是 select 通过假设检验(例如分数检验)或 BIC 的“最佳”断点数的包装器。目前 selection 通过假设检验仅限于 0,1 或 2 个断点 selected。 亲切的问候, 维托