如何设置回归样条中的节点数

Question

Someone 有同样的问题，但他们使用的是 splines 库，而我使用的是 tidymodels.

我想拟合三次样条并将自变量的域拆分为 6 个 bin（即在其域中进行 5 次切割）。

我相信这是用 step_bs() 完成的（或 step_ns() 在自然样条的情况下）。

我无法找到哪个参数设置结数 documentation. Moreover, it seems that splines::ns() can be passed to the options parameter, but the Readme 不可用。

Answer 1

您可能会发现 this answer helpful in understanding the relationship between knots and degrees of freedom. You can set both deg_free and degree (the polynomial degree) in step_bs():

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
data(biomass, package = "modeldata")

biomass_tr <- biomass[biomass$dataset == "Training",]
biomass_te <- biomass[biomass$dataset == "Testing",]

rec <- recipe(HHV ~ carbon + hydrogen + oxygen,
              data = biomass_tr) %>%
    step_bs(carbon, deg_free = 7, degree = 4)

## training data
prep(rec) %>% bake(new_data = biomass_tr)
#> # A tibble: 456 × 10
#>    hydrogen oxygen   HHV carbon_bs_1 carbon_bs_2 carbon_bs_3 carbon_bs_4
#>       <dbl>  <dbl> <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
#>  1     5.64   42.9  20.0 0           0                 0.489       0.494
#>  2     5.7    41.3  19.2 0           0.000000158       0.502       0.484
#>  3     5.8    46.2  18.3 0           0.000812          0.575       0.421
#>  4     4.97   35.6  18.2 0.000196    0.0256            0.669       0.305
#>  5     5.4    40.7  18.4 0.000000163 0.00476           0.619       0.375
#>  6     5.75   40.2  18.5 0.000102    0.0202            0.663       0.317
#>  7     5.99   38.2  18.7 0           0.00263           0.603       0.393
#>  8     5.7    39.7  18.3 0.0000470   0.0156            0.655       0.329
#>  9     5.5    40.9  18.6 0           0.0000451         0.532       0.460
#> 10     5.9    40    18.9 0           0.00293           0.606       0.390
#> # … with 446 more rows, and 3 more variables: carbon_bs_5 <dbl>,
#> #   carbon_bs_6 <dbl>, carbon_bs_7 <dbl>

## testing data
prep(rec) %>% bake(new_data = biomass_te)
#> # A tibble: 80 × 10
#>    hydrogen oxygen   HHV carbon_bs_1 carbon_bs_2 carbon_bs_3 carbon_bs_4
#>       <dbl>  <dbl> <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
#>  1     5.67   47.2  18.3  0.00000387   0.00795         0.635      0.357 
#>  2     5.5    48.1  17.6  0.00261      0.0730          0.687      0.237 
#>  3     5.5    49.1  17.2  0.00431      0.0907          0.685      0.220 
#>  4     6.1    37.3  18.9  0.00000294   0.00750         0.633      0.359 
#>  5     6.32   42.8  20.5  0            0.0000535       0.534      0.458 
#>  6     5.5    41.7  18.5  0.000751     0.0434          0.682      0.274 
#>  7     5.23   54.1  15.1  0.0358       0.229           0.610      0.124 
#>  8     4.66   33.8  16.2  0.00687      0.111           0.680      0.201 
#>  9     4.4    31.1  11.1  0.294        0.396           0.224      0.0160
#> 10     3.77   23.7  10.8  0.339        0.376           0.175      0.0107
#> # … with 70 more rows, and 3 more variables: carbon_bs_5 <dbl>,
#> #   carbon_bs_6 <dbl>, carbon_bs_7 <dbl>

^{由 reprex package (v2.0.1)}

于 2022-04-18 创建

如何设置回归样条中的节点数

How to set the number of knots in a regression spline

r

spline

tidymodels