如何设置回归样条中的节点数
How to set the number of knots in a regression spline
Someone 有同样的问题,但他们使用的是 splines
库,而我使用的是 tidymodels
.
我想拟合三次样条并将自变量的域拆分为 6 个 bin(即在其域中进行 5 次切割)。
我相信这是用 step_bs()
完成的(或 step_ns()
在自然样条的情况下)。
我无法找到哪个参数设置结数 documentation. Moreover, it seems that splines::ns()
can be passed to the options
parameter, but the Readme 不可用。
您可能会发现 this answer helpful in understanding the relationship between knots and degrees of freedom. You can set both deg_free
and degree
(the polynomial degree) in step_bs()
:
library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
data(biomass, package = "modeldata")
biomass_tr <- biomass[biomass$dataset == "Training",]
biomass_te <- biomass[biomass$dataset == "Testing",]
rec <- recipe(HHV ~ carbon + hydrogen + oxygen,
data = biomass_tr) %>%
step_bs(carbon, deg_free = 7, degree = 4)
## training data
prep(rec) %>% bake(new_data = biomass_tr)
#> # A tibble: 456 × 10
#> hydrogen oxygen HHV carbon_bs_1 carbon_bs_2 carbon_bs_3 carbon_bs_4
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 5.64 42.9 20.0 0 0 0.489 0.494
#> 2 5.7 41.3 19.2 0 0.000000158 0.502 0.484
#> 3 5.8 46.2 18.3 0 0.000812 0.575 0.421
#> 4 4.97 35.6 18.2 0.000196 0.0256 0.669 0.305
#> 5 5.4 40.7 18.4 0.000000163 0.00476 0.619 0.375
#> 6 5.75 40.2 18.5 0.000102 0.0202 0.663 0.317
#> 7 5.99 38.2 18.7 0 0.00263 0.603 0.393
#> 8 5.7 39.7 18.3 0.0000470 0.0156 0.655 0.329
#> 9 5.5 40.9 18.6 0 0.0000451 0.532 0.460
#> 10 5.9 40 18.9 0 0.00293 0.606 0.390
#> # … with 446 more rows, and 3 more variables: carbon_bs_5 <dbl>,
#> # carbon_bs_6 <dbl>, carbon_bs_7 <dbl>
## testing data
prep(rec) %>% bake(new_data = biomass_te)
#> # A tibble: 80 × 10
#> hydrogen oxygen HHV carbon_bs_1 carbon_bs_2 carbon_bs_3 carbon_bs_4
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 5.67 47.2 18.3 0.00000387 0.00795 0.635 0.357
#> 2 5.5 48.1 17.6 0.00261 0.0730 0.687 0.237
#> 3 5.5 49.1 17.2 0.00431 0.0907 0.685 0.220
#> 4 6.1 37.3 18.9 0.00000294 0.00750 0.633 0.359
#> 5 6.32 42.8 20.5 0 0.0000535 0.534 0.458
#> 6 5.5 41.7 18.5 0.000751 0.0434 0.682 0.274
#> 7 5.23 54.1 15.1 0.0358 0.229 0.610 0.124
#> 8 4.66 33.8 16.2 0.00687 0.111 0.680 0.201
#> 9 4.4 31.1 11.1 0.294 0.396 0.224 0.0160
#> 10 3.77 23.7 10.8 0.339 0.376 0.175 0.0107
#> # … with 70 more rows, and 3 more variables: carbon_bs_5 <dbl>,
#> # carbon_bs_6 <dbl>, carbon_bs_7 <dbl>
由 reprex package (v2.0.1)
于 2022-04-18 创建
Someone 有同样的问题,但他们使用的是 splines
库,而我使用的是 tidymodels
.
我想拟合三次样条并将自变量的域拆分为 6 个 bin(即在其域中进行 5 次切割)。
我相信这是用 step_bs()
完成的(或 step_ns()
在自然样条的情况下)。
我无法找到哪个参数设置结数 documentation. Moreover, it seems that splines::ns()
can be passed to the options
parameter, but the Readme 不可用。
您可能会发现 this answer helpful in understanding the relationship between knots and degrees of freedom. You can set both deg_free
and degree
(the polynomial degree) in step_bs()
:
library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
data(biomass, package = "modeldata")
biomass_tr <- biomass[biomass$dataset == "Training",]
biomass_te <- biomass[biomass$dataset == "Testing",]
rec <- recipe(HHV ~ carbon + hydrogen + oxygen,
data = biomass_tr) %>%
step_bs(carbon, deg_free = 7, degree = 4)
## training data
prep(rec) %>% bake(new_data = biomass_tr)
#> # A tibble: 456 × 10
#> hydrogen oxygen HHV carbon_bs_1 carbon_bs_2 carbon_bs_3 carbon_bs_4
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 5.64 42.9 20.0 0 0 0.489 0.494
#> 2 5.7 41.3 19.2 0 0.000000158 0.502 0.484
#> 3 5.8 46.2 18.3 0 0.000812 0.575 0.421
#> 4 4.97 35.6 18.2 0.000196 0.0256 0.669 0.305
#> 5 5.4 40.7 18.4 0.000000163 0.00476 0.619 0.375
#> 6 5.75 40.2 18.5 0.000102 0.0202 0.663 0.317
#> 7 5.99 38.2 18.7 0 0.00263 0.603 0.393
#> 8 5.7 39.7 18.3 0.0000470 0.0156 0.655 0.329
#> 9 5.5 40.9 18.6 0 0.0000451 0.532 0.460
#> 10 5.9 40 18.9 0 0.00293 0.606 0.390
#> # … with 446 more rows, and 3 more variables: carbon_bs_5 <dbl>,
#> # carbon_bs_6 <dbl>, carbon_bs_7 <dbl>
## testing data
prep(rec) %>% bake(new_data = biomass_te)
#> # A tibble: 80 × 10
#> hydrogen oxygen HHV carbon_bs_1 carbon_bs_2 carbon_bs_3 carbon_bs_4
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 5.67 47.2 18.3 0.00000387 0.00795 0.635 0.357
#> 2 5.5 48.1 17.6 0.00261 0.0730 0.687 0.237
#> 3 5.5 49.1 17.2 0.00431 0.0907 0.685 0.220
#> 4 6.1 37.3 18.9 0.00000294 0.00750 0.633 0.359
#> 5 6.32 42.8 20.5 0 0.0000535 0.534 0.458
#> 6 5.5 41.7 18.5 0.000751 0.0434 0.682 0.274
#> 7 5.23 54.1 15.1 0.0358 0.229 0.610 0.124
#> 8 4.66 33.8 16.2 0.00687 0.111 0.680 0.201
#> 9 4.4 31.1 11.1 0.294 0.396 0.224 0.0160
#> 10 3.77 23.7 10.8 0.339 0.376 0.175 0.0107
#> # … with 70 more rows, and 3 more variables: carbon_bs_5 <dbl>,
#> # carbon_bs_6 <dbl>, carbon_bs_7 <dbl>
由 reprex package (v2.0.1)
于 2022-04-18 创建