反转转换食谱步骤(规范化和日志)的优雅方式?

Elegant way to invert tranform recipes steps (normalize and log)?

转换回由食谱转换的 outcome(在本例中为 mpg)列的最优雅方法是什么? 解决方案可以是通用的(如果存在或仅适用于 lognormalize 步骤(如下编码)。

可能有用的链接:
讨论了通用解决方案 here,但我认为它尚未实施。
提供了 R 函数 scale 的解决方案 here 但我不确定在这种情况下我是否可以提供帮助。

library(recipes)

data <- tibble(mtcars) %>% 
    select(cyl, mpg)

rec <- recipe(mpg ~ ., data = data) %>%
    step_log(all_numeric()) %>%
    step_normalize(all_numeric()) %>%
    prep()

data_baked <- bake(rec, new_data = data)

# model fitting, predictions, etc...

# how to invert/transform back predictions (estimates) and true outcomes

从配方转换中取回您需要的任何值的方法 is to tidy() the recipe 然后使用 dplyr 动词取回您需要的值。

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step

data <- tibble(mtcars) %>% 
  select(cyl, mpg)

rec <- recipe(mpg ~ ., data = data) %>%
  step_log(all_numeric()) %>%
  step_normalize(all_numeric(), id = "normalize_num") %>%
  prep()

两种 方法可以得出食谱步骤,然后您可以tidy() 加上参数:

## notice that you can identify steps by `number` or `id`
tidy(rec)
#> # A tibble: 2 x 6
#>   number operation type      trained skip  id           
#>    <int> <chr>     <chr>     <lgl>   <lgl> <chr>        
#> 1      1 step      log       TRUE    FALSE log_LYuaY    
#> 2      2 step      normalize TRUE    FALSE normalize_num

## choose by number
tidy(rec, number = 1)
#> # A tibble: 2 x 3
#>   terms  base id       
#>   <chr> <dbl> <chr>    
#> 1 cyl    2.72 log_LYuaY
#> 2 mpg    2.72 log_LYuaY
## choose by id, which we set above (otherwise it has random id like log)
tidy(rec, id = "normalize_num")
#> # A tibble: 4 x 4
#>   terms statistic value id           
#>   <chr> <chr>     <dbl> <chr>        
#> 1 cyl   mean      1.78  normalize_num
#> 2 mpg   mean      2.96  normalize_num
#> 3 cyl   sd        0.309 normalize_num
#> 4 mpg   sd        0.298 normalize_num

一旦我们知道我们想要哪一步,我们就可以使用 dplyr 动词来准确地找出我们想要转换回的值,比如 mpg.

的平均值
## extract out value
tidy(rec, id = "normalize_num") %>%
  filter(terms == "mpg", statistic == "mean") %>%
  pull(value)
#>      mpg 
#> 2.957514

reprex package (v0.3.0)

于 2021-01-25 创建