使用 purrr 进行下采样。唯一标识符

Question

我想使用 purrr to group by a unique identifier and then downSample a factor variable using the caret 包。这是下面的代码：

out <- train %>% select(stream, HUC12) %>% 
  na.omit() %>% group_by(HUC12) %>% 
  nest %>% mutate(prop = map(data, ~downSample(.x, factor('stream'))))

如有任何帮助，我们将不胜感激。这是一些示例数据。

train <- data.frame(stream = factor(sample(x= 0:1, size = 100, replace = TRUE, 
                    prob = c(0.25,.75))), HUC12 = rep(c("a","b","c","d")))

Answer 1

生成数据：

set.seed(100)
train <- data.frame(stream = factor(sample(x= 0:1, size = 100, replace = TRUE, 
                    prob = c(0.25,.75))), HUC12 = rep(c("a","b","c","d")))

试试这样，因为你的downSample returns a data.frame，我们可以使用dplyr中的do函数来进行下采样。

library(dplyr)
down_train <- train %>% select(stream, HUC12) %>%  
na.omit() %>% group_by(HUC12) %>%  do(downSample(.,.$stream))

我们可以检查：

down_train %>% count(HUC12,stream)

# A tibble: 8 x 3
# Groups:   HUC12 [4]
  HUC12 stream     n
  <fct> <fct>  <int>
1 a     0          1
2 a     1          1
3 b     0          4
4 b     1          4
5 c     0         11
6 c     1         11
7 d     0          8
8 d     1          8

并且在原始数据中：

train %>% count(HUC12,stream)
# A tibble: 8 x 3
  HUC12 stream     n
  <fct> <fct>  <int>
1 a     0          1
2 a     1         24
3 b     0          4
4 b     1         21
5 c     0         11
6 c     1         14
7 d     0          8
8 d     1         17

使用 purrr 进行下采样。唯一标识符

Downsampling using purrr. Unique identifier

r

r-caret

purrr

tidyverse