使用 purrr 进行下采样。唯一标识符
Downsampling using purrr. Unique identifier
我想使用 purrr to group by a unique identifier and then downSample
a factor variable using the caret 包。这是下面的代码:
out <- train %>% select(stream, HUC12) %>%
na.omit() %>% group_by(HUC12) %>%
nest %>% mutate(prop = map(data, ~downSample(.x, factor('stream'))))
如有任何帮助,我们将不胜感激。这是一些示例数据。
train <- data.frame(stream = factor(sample(x= 0:1, size = 100, replace = TRUE,
prob = c(0.25,.75))), HUC12 = rep(c("a","b","c","d")))
生成数据:
set.seed(100)
train <- data.frame(stream = factor(sample(x= 0:1, size = 100, replace = TRUE,
prob = c(0.25,.75))), HUC12 = rep(c("a","b","c","d")))
试试这样,因为你的downSample returns a data.frame,我们可以使用dplyr中的do
函数来进行下采样。
library(dplyr)
down_train <- train %>% select(stream, HUC12) %>%
na.omit() %>% group_by(HUC12) %>% do(downSample(.,.$stream))
我们可以检查:
down_train %>% count(HUC12,stream)
# A tibble: 8 x 3
# Groups: HUC12 [4]
HUC12 stream n
<fct> <fct> <int>
1 a 0 1
2 a 1 1
3 b 0 4
4 b 1 4
5 c 0 11
6 c 1 11
7 d 0 8
8 d 1 8
并且在原始数据中:
train %>% count(HUC12,stream)
# A tibble: 8 x 3
HUC12 stream n
<fct> <fct> <int>
1 a 0 1
2 a 1 24
3 b 0 4
4 b 1 21
5 c 0 11
6 c 1 14
7 d 0 8
8 d 1 17
我想使用 purrr to group by a unique identifier and then downSample
a factor variable using the caret 包。这是下面的代码:
out <- train %>% select(stream, HUC12) %>%
na.omit() %>% group_by(HUC12) %>%
nest %>% mutate(prop = map(data, ~downSample(.x, factor('stream'))))
如有任何帮助,我们将不胜感激。这是一些示例数据。
train <- data.frame(stream = factor(sample(x= 0:1, size = 100, replace = TRUE,
prob = c(0.25,.75))), HUC12 = rep(c("a","b","c","d")))
生成数据:
set.seed(100)
train <- data.frame(stream = factor(sample(x= 0:1, size = 100, replace = TRUE,
prob = c(0.25,.75))), HUC12 = rep(c("a","b","c","d")))
试试这样,因为你的downSample returns a data.frame,我们可以使用dplyr中的do
函数来进行下采样。
library(dplyr)
down_train <- train %>% select(stream, HUC12) %>%
na.omit() %>% group_by(HUC12) %>% do(downSample(.,.$stream))
我们可以检查:
down_train %>% count(HUC12,stream)
# A tibble: 8 x 3
# Groups: HUC12 [4]
HUC12 stream n
<fct> <fct> <int>
1 a 0 1
2 a 1 1
3 b 0 4
4 b 1 4
5 c 0 11
6 c 1 11
7 d 0 8
8 d 1 8
并且在原始数据中:
train %>% count(HUC12,stream)
# A tibble: 8 x 3
HUC12 stream n
<fct> <fct> <int>
1 a 0 1
2 a 1 24
3 b 0 4
4 b 1 21
5 c 0 11
6 c 1 14
7 d 0 8
8 d 1 17