在 R tidymodels 中,我如何指定特定变量的对比?

In R tidymodels how can I specify contrasts for specific variables?

我想使用 tidymodels 配方为 LM 中的两个预测变量指定“和为零”的对比。可能吗?查看 recipes 文档,在 1.3 之前,似乎有人尝试构建特定于变量的选项,但该策略已转移到全局选项。

我正在尝试将此基本 R 代码转换为 tidymodels:

Bikeshare <- ISLR2::Bikeshare  # start with original data
contrasts(Bikeshare$hr) <- contr.sum(24)
contrasts(Bikeshare$mnth) <- contr.sum(12)
mod.lm2 <-
  lm(
    bikers ~ mnth + hr + workingday + temp + weathersit,
    data = Bikeshare
  )
summary(mod.lm2)

我走到这一步:

library(tidymodels)
Bikeshare <- ISLR2::Bikeshare  # start with original data
contrasts(Bikeshare$hr) <- contr.sum(24)
contrasts(Bikeshare$mnth) <- contr.sum(12)

lm_spec <- linear_reg() %>%
  set_engine("lm")

the_rec <- 
  recipe(
    bikers ~ mnth + hr + workingday + temp + weathersit,
    data = Bikeshare
  ) %>%
  step_dummy(c(mnth, hr), one_hot = TRUE)

the_workflow<- workflow() %>% 
  add_recipe(the_rec) %>% 
  add_model(lm_spec)

the_workflow_fit_lm_fit <- 
  fit(the_workflow, data = Bikeshare) %>% 
  extract_fit_parsnip()

summary(the_workflow_fit_lm_fit$fit)

有人知道如何从 tidymodels 工作流程中获得相同的结果吗?

我不认为我可以使用 contr.sum 作为全局选项。这为我提供了我想要的两个变量的贝塔值,但它改变了其他变量的对比。

BikeShare <- ISLR2::Bikeshare # be sure to work with original data ; 
old_opt <- options()$contrast; 
options(contrasts = c('contr.sum', 'contr.poly'))

step_dummy() 的文档有:

To change the type of contrast being used, change the global contrast option via options.

所以除了全局选项之外,没有办法改变它。

虽然我们应该有一个例子 :-/

请注意,对于新样本,将再次从全局选项中读取选项。确保它们在预测时设置相同:

library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
tidymodels_prefer()

data("penguins")

penguins <- 
  penguins %>% 
  distinct(species)

# R's defaults
old_opt <- options()$contrast
old_opt
#>         unordered           ordered 
#> "contr.treatment"      "contr.poly"

# default contrast
default <- 
  recipe(~ species, data = penguins) %>% 
  step_dummy(species) %>% 
  prep()

default %>%  bake(new_data = NULL)
#> # A tibble: 3 × 2
#>   species_Chinstrap species_Gentoo
#>               <dbl>          <dbl>
#> 1                 0              0
#> 2                 0              1
#> 3                 1              0

# Do do something different

# Now set to something else:
options(contrasts = c('contr.sum', 'contr.poly'))

with_opt <- 
  recipe(~ species, data = penguins) %>% 
  step_dummy(species) %>% 
  prep()

with_opt %>% bake(new_data = NULL)
#> # A tibble: 3 × 2
#>   species_X1 species_X2
#>        <dbl>      <dbl>
#> 1          1          0
#> 2         -1         -1
#> 3          0          1

# reset options: 

options(contrasts = old_opt)
with_opt %>% bake(new_data = penguins)
#> # A tibble: 3 × 2
#>   species_Chinstrap species_Gentoo
#>               <dbl>          <dbl>
#> 1                 0              0
#> 2                 0              1
#> 3                 1              0

reprex package (v2.0.0)

于 2021-11-16 创建

为清楚起见编辑