反转转换食谱步骤(规范化和日志)的优雅方式?
Elegant way to invert tranform recipes steps (normalize and log)?
转换回由食谱转换的 outcome
(在本例中为 mpg
)列的最优雅方法是什么?
解决方案可以是通用的(如果存在或仅适用于 log
和 normalize
步骤(如下编码)。
可能有用的链接:
讨论了通用解决方案 here,但我认为它尚未实施。
提供了 R 函数 scale
的解决方案 here 但我不确定在这种情况下我是否可以提供帮助。
library(recipes)
data <- tibble(mtcars) %>%
select(cyl, mpg)
rec <- recipe(mpg ~ ., data = data) %>%
step_log(all_numeric()) %>%
step_normalize(all_numeric()) %>%
prep()
data_baked <- bake(rec, new_data = data)
# model fitting, predictions, etc...
# how to invert/transform back predictions (estimates) and true outcomes
从配方转换中取回您需要的任何值的方法 is to tidy()
the recipe 然后使用 dplyr 动词取回您需要的值。
library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
data <- tibble(mtcars) %>%
select(cyl, mpg)
rec <- recipe(mpg ~ ., data = data) %>%
step_log(all_numeric()) %>%
step_normalize(all_numeric(), id = "normalize_num") %>%
prep()
有两种 方法可以得出食谱步骤,然后您可以tidy()
加上参数:
## notice that you can identify steps by `number` or `id`
tidy(rec)
#> # A tibble: 2 x 6
#> number operation type trained skip id
#> <int> <chr> <chr> <lgl> <lgl> <chr>
#> 1 1 step log TRUE FALSE log_LYuaY
#> 2 2 step normalize TRUE FALSE normalize_num
## choose by number
tidy(rec, number = 1)
#> # A tibble: 2 x 3
#> terms base id
#> <chr> <dbl> <chr>
#> 1 cyl 2.72 log_LYuaY
#> 2 mpg 2.72 log_LYuaY
## choose by id, which we set above (otherwise it has random id like log)
tidy(rec, id = "normalize_num")
#> # A tibble: 4 x 4
#> terms statistic value id
#> <chr> <chr> <dbl> <chr>
#> 1 cyl mean 1.78 normalize_num
#> 2 mpg mean 2.96 normalize_num
#> 3 cyl sd 0.309 normalize_num
#> 4 mpg sd 0.298 normalize_num
一旦我们知道我们想要哪一步,我们就可以使用 dplyr 动词来准确地找出我们想要转换回的值,比如 mpg
.
的平均值
## extract out value
tidy(rec, id = "normalize_num") %>%
filter(terms == "mpg", statistic == "mean") %>%
pull(value)
#> mpg
#> 2.957514
由 reprex package (v0.3.0)
于 2021-01-25 创建
转换回由食谱转换的 outcome
(在本例中为 mpg
)列的最优雅方法是什么?
解决方案可以是通用的(如果存在或仅适用于 log
和 normalize
步骤(如下编码)。
可能有用的链接:
讨论了通用解决方案 here,但我认为它尚未实施。
提供了 R 函数 scale
的解决方案 here 但我不确定在这种情况下我是否可以提供帮助。
library(recipes)
data <- tibble(mtcars) %>%
select(cyl, mpg)
rec <- recipe(mpg ~ ., data = data) %>%
step_log(all_numeric()) %>%
step_normalize(all_numeric()) %>%
prep()
data_baked <- bake(rec, new_data = data)
# model fitting, predictions, etc...
# how to invert/transform back predictions (estimates) and true outcomes
从配方转换中取回您需要的任何值的方法 is to tidy()
the recipe 然后使用 dplyr 动词取回您需要的值。
library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
data <- tibble(mtcars) %>%
select(cyl, mpg)
rec <- recipe(mpg ~ ., data = data) %>%
step_log(all_numeric()) %>%
step_normalize(all_numeric(), id = "normalize_num") %>%
prep()
有两种 方法可以得出食谱步骤,然后您可以tidy()
加上参数:
## notice that you can identify steps by `number` or `id`
tidy(rec)
#> # A tibble: 2 x 6
#> number operation type trained skip id
#> <int> <chr> <chr> <lgl> <lgl> <chr>
#> 1 1 step log TRUE FALSE log_LYuaY
#> 2 2 step normalize TRUE FALSE normalize_num
## choose by number
tidy(rec, number = 1)
#> # A tibble: 2 x 3
#> terms base id
#> <chr> <dbl> <chr>
#> 1 cyl 2.72 log_LYuaY
#> 2 mpg 2.72 log_LYuaY
## choose by id, which we set above (otherwise it has random id like log)
tidy(rec, id = "normalize_num")
#> # A tibble: 4 x 4
#> terms statistic value id
#> <chr> <chr> <dbl> <chr>
#> 1 cyl mean 1.78 normalize_num
#> 2 mpg mean 2.96 normalize_num
#> 3 cyl sd 0.309 normalize_num
#> 4 mpg sd 0.298 normalize_num
一旦我们知道我们想要哪一步,我们就可以使用 dplyr 动词来准确地找出我们想要转换回的值,比如 mpg
.
## extract out value
tidy(rec, id = "normalize_num") %>%
filter(terms == "mpg", statistic == "mean") %>%
pull(value)
#> mpg
#> 2.957514
由 reprex package (v0.3.0)
于 2021-01-25 创建