dplyr:mutate 中的自定义函数。使用全矩阵而不是块?
dplyr: custum function in mutate. Uses full matrix instead of chunks?
考虑这个例子:
library(dplyr)
library(magrittr)
set.seed(123)
grp_s <- round(runif(4, 1, 10))
group <- rep(1:length(grp_s), grp_s)
dataF <- data.frame(grouping = group, var_a = runif(length(group)), var_b = runif(length(group)), var_c = runif(length(group)))
compute_it <- function(var_a, var_b){
sum(var_a[var_b > .5], na.rm = TRUE)
}
dataF %<>%
group_by(grouping) %>%
mutate(fix_it = compute_it(var_a, var_b))
到目前为止一切顺利。现在代替 compute_it
作为参数
列名,我想使用一个函数作为参数
数据块(grouping
的每个值一个块)。
使用此函数的列表:
compute_it_2 <- function(Data){
sum(Data$var_a[Data$var_b > .5], na.rm = TRUE)
}
上面用的是compute_it
。怎么做?
同时使用 tidyr
和 purrr
我们可以先使用 do
或 nest
:
library(tidyverse)
dataF %>%
group_by(grouping) %>%
do(fix_it = compute_it_2(.)) %>%
unnest()
给予:
# A tibble: 4 × 2
grouping fix_it
<int> <dbl>
1 1 2.4065483
2 2 0.9568333
3 3 0.0000000
4 4 1.8274955
或者嵌套方式:
dataF %>%
group_by(grouping) %>%
nest() %>%
mutate(fix_it = map_dbl(data, compute_it_2))
# A tibble: 4 × 3
grouping data fix_it
<int> <list> <dbl>
1 1 <tibble [4 × 3]> 2.4065483
2 2 <tibble [8 × 3]> 0.9568333
3 3 <tibble [5 × 3]> 0.0000000
4 4 <tibble [9 × 3]> 1.8274955
如果你unnest()
第二个选项你会得到原来的框架:
# A tibble: 26 × 5
grouping fix_it var_a var_b var_c
<int> <dbl> <dbl> <dbl> <dbl>
1 1 2.4065483 0.9404673 0.96302423 0.12753165
2 1 2.4065483 0.0455565 0.90229905 0.75330786
3 1 2.4065483 0.5281055 0.69070528 0.89504536
4 1 2.4065483 0.8924190 0.79546742 0.37446278
5 2 0.9568333 0.5514350 0.02461368 0.66511519
6 2 0.9568333 0.4566147 0.47779597 0.09484066
7 2 0.9568333 0.9568333 0.75845954 0.38396964
8 2 0.9568333 0.4533342 0.21640794 0.27438364
9 2 0.9568333 0.6775706 0.31818101 0.81464004
10 2 0.9568333 0.5726334 0.23162579 0.44851634
# ... with 16 more rows
考虑这个例子:
library(dplyr)
library(magrittr)
set.seed(123)
grp_s <- round(runif(4, 1, 10))
group <- rep(1:length(grp_s), grp_s)
dataF <- data.frame(grouping = group, var_a = runif(length(group)), var_b = runif(length(group)), var_c = runif(length(group)))
compute_it <- function(var_a, var_b){
sum(var_a[var_b > .5], na.rm = TRUE)
}
dataF %<>%
group_by(grouping) %>%
mutate(fix_it = compute_it(var_a, var_b))
到目前为止一切顺利。现在代替 compute_it
作为参数
列名,我想使用一个函数作为参数
数据块(grouping
的每个值一个块)。
使用此函数的列表:
compute_it_2 <- function(Data){
sum(Data$var_a[Data$var_b > .5], na.rm = TRUE)
}
上面用的是compute_it
。怎么做?
同时使用 tidyr
和 purrr
我们可以先使用 do
或 nest
:
library(tidyverse)
dataF %>%
group_by(grouping) %>%
do(fix_it = compute_it_2(.)) %>%
unnest()
给予:
# A tibble: 4 × 2 grouping fix_it <int> <dbl> 1 1 2.4065483 2 2 0.9568333 3 3 0.0000000 4 4 1.8274955
或者嵌套方式:
dataF %>%
group_by(grouping) %>%
nest() %>%
mutate(fix_it = map_dbl(data, compute_it_2))
# A tibble: 4 × 3 grouping data fix_it <int> <list> <dbl> 1 1 <tibble [4 × 3]> 2.4065483 2 2 <tibble [8 × 3]> 0.9568333 3 3 <tibble [5 × 3]> 0.0000000 4 4 <tibble [9 × 3]> 1.8274955
如果你unnest()
第二个选项你会得到原来的框架:
# A tibble: 26 × 5 grouping fix_it var_a var_b var_c <int> <dbl> <dbl> <dbl> <dbl> 1 1 2.4065483 0.9404673 0.96302423 0.12753165 2 1 2.4065483 0.0455565 0.90229905 0.75330786 3 1 2.4065483 0.5281055 0.69070528 0.89504536 4 1 2.4065483 0.8924190 0.79546742 0.37446278 5 2 0.9568333 0.5514350 0.02461368 0.66511519 6 2 0.9568333 0.4566147 0.47779597 0.09484066 7 2 0.9568333 0.9568333 0.75845954 0.38396964 8 2 0.9568333 0.4533342 0.21640794 0.27438364 9 2 0.9568333 0.6775706 0.31818101 0.81464004 10 2 0.9568333 0.5726334 0.23162579 0.44851634 # ... with 16 more rows