rsample vfold_cv 函数不接受来自 purrr::map2 的 .y 参数

rsample vfold_cv function not accepting .y parameter from purrr::map2

我正在尝试使用 rsample 包创建嵌套交叉验证,并且我使用 purrr::map2 多次创建它们,根据 v 参数。但是,vfold_cv 函数不接受 v 参数,而是出现此错误:Error: v must be a single integer.

在下面的 reprex 中,我通过为每个圆柱体创建交叉验证来使用 mtcars 数据模拟情况。用数字替换 .y 是可行的,但我需要通过使用 n 列使参数随每个气缸而变化。

library(purrr)
library(parsnip)
library(rsample)
library(tidyr)

data("mtcars")

nested <- mtcars %>% 
    select(cyl, disp:gear) %>% 
    group_by(cyl) %>% 
    nest(data = disp:gear) %>% 
    cbind(n = 2:4)

nested %>% 
    group_by(cyl) %>% 
    mutate(cv = map2(data, n,
                     ~nested_cv(.x,
                                inside = vfold_cv(v = 10, repeats = 3),
                                outside = vfold_cv(v = .y))))

错误:`v` 必须是单个整数。

nested_cv里面是vfold_cv函数,你可以试试:

createNested = function(x,y){
    nested_cv(x,inside = vfold_cv(v = 10, repeats = 3),outside = vfold_cv(v = y))
}

createNested(nested$data[[1]],3)
Error in vfold_splits(data = data, v = v, strata = strata, breaks = breaks) : 
  object 'y' not found

所以它看不到函数内部的 y 变量(比如你的 .y)。所以我写了一个函数来显式地将vfold_cv() for outside的结果传递给nested_cv(),多了几行代码但是没关系:

createNested = function(x,y){
    outside_cv = vfold_cv(x,v = y)
    nested_cv(x,inside = vfold_cv(v = 10, repeats = 3),outside = outside_cv)
}

nested <- mtcars %>% 
select(cyl, disp:gear) %>% 
nest(data = disp:gear) %>%
mutate(n=2:4)

nested %>%  mutate(cv = map2(data,n,.f=createNested))

# A tibble: 3 x 4
    cyl data                  n cv              
  <dbl> <list>            <int> <list>          
1     6 <tibble [7 × 8]>      2 <tibble [2 × 3]>
2     4 <tibble [11 × 8]>     3 <tibble [3 × 3]>
3     8 <tibble [14 × 8]>     4 <tibble [4 × 3]>

注意,嵌套数据后,就不需要group_by()