映射数据帧列表并应用自定义变异函数(purrr、dplyr)

Map over list of dataframes and apply custom mutate-function (purrr, dplyr)

所以我有这个列表:

list(`0` = structure(list(fn = 0L, fp = 34L, tn = 0L, tp = 34L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.1` = structure(list(
    fn = 1L, fp = 26L, tn = 8L, tp = 33L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.2` = structure(list(
    fn = 3L, fp = 22L, tn = 12L, tp = 31L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.3` = structure(list(
    fn = 5L, fp = 7L, tn = 27L, tp = 29L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.4` = structure(list(
    fn = 5L, fp = 3L, tn = 31L, tp = 29L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.5` = structure(list(
    fn = 7L, fp = 1L, tn = 33L, tp = 27L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.6` = structure(list(
    fn = 8L, fp = 0L, tn = 34L, tp = 26L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.7` = structure(list(
    fn = 8L, fp = 0L, tn = 34L, tp = 26L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.8` = structure(list(
    fn = 8L, fp = 0L, tn = 34L, tp = 26L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.9` = structure(list(
    fn = 30L, fp = 0L, tn = 34L, tp = 4L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `1` = structure(list(
    fn = 34L, fp = 0L, tn = 34L, tp = 0L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")))

当我对 10 个不同的分位数应用分位数回归模型时,它基本上是一个长度为 10 的列表。每个元素都是一个包含 true/false postive/negative 计数的数据框。现在我想编写一个函数,我可以在其中“动态”计算可以使用这些计数计算的各种指标。因此,例如第一个元素如下所示:

> cms[[1]]
# A tibble: 1 x 4
     fn    fp    tn    tp
  <int> <int> <int> <int>
1     0    34     0    34

因为它是一个列表,所以我真的很想用 purrrmaplapply 或类似的东西做一些事情。然后我想:好吧,有一天我想要真阳性率,有一天我可能想要特异性。因此,我想我会写一个函数,它可以将一些列作为输入并执行“经典”dplyr::mutate。但是我又一次被关于整洁评估的知识所困。所以我做了这样的事情(请不要评判):

fun = function(...){
  f = rlang::enexpr(...)
  return(f)
}

fpr = fun(tp / tp + fn)

# does not work
map(cms, ~mutate(.x, fpr=fpr)) 

# this (non-tidy-eval) works
map(cms, ~mutate(.x, fpr=tp / tp + fn))

我真的很想动态传递列并使用 tidy-evaluation 计算结果。因此,我将不胜感激任何帮助或指点:)

我不确定我是否理解正确,但您可以这样定义参数计算:

fpr <- \(...) with(list(...), tp / (tp + fn))

然后定义一个辅助函数:

add_param <- \(f, ...) tibble::tibble(..., "{substitute(f)}" := f(...))

最后,通过pmap()调用它:

library(purrr)

cms %>%
  dplyr::bind_rows() %>%
  pmap_dfr(add_param, fpr)

Returns:

# A tibble: 11 x 5
      fn    fp    tn    tp   fpr
   <int> <int> <int> <int> <dbl>
 1     0    34     0    34 1    
 2     1    26     8    33 0.971
 3     3    22    12    31 0.912
 4     5     7    27    29 0.853
 5     5     3    31    29 0.853
 6     7     1    33    27 0.794
 7     8     0    34    26 0.765
 8     8     0    34    26 0.765
 9     8     0    34    26 0.765
10    30     0    34     4 0.118
11    34     0    34     0 0    

(使用的数据:)

cms <- list(`0` = structure(list(fn = 0L, fp = 34L, tn = 0L, tp = 34L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `0.1` = structure(list( fn = 1L, fp = 26L, tn = 8L, tp = 33L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `0.2` = structure(list( fn = 3L, fp = 22L, tn = 12L, tp = 31L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `0.3` = structure(list( fn = 5L, fp = 7L, tn = 27L, tp = 29L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `0.4` = structure(list( fn = 5L, fp = 3L, tn = 31L, tp = 29L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `0.5` = structure(list( fn = 7L, fp = 1L, tn = 33L, tp = 27L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `0.6` = structure(list( fn = 8L, fp = 0L, tn = 34L, tp = 26L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `0.7` = structure(list( fn = 8L, fp = 0L, tn = 34L, tp = 26L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `0.8` = structure(list( fn = 8L, fp = 0L, tn = 34L, tp = 26L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `0.9` = structure(list( fn = 30L, fp = 0L, tn = 34L, tp = 4L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `1` = structure(list( fn = 34L, fp = 0L, tn = 34L, tp = 0L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")))

您还可以创建更通用的函数并使用 switch() 控制流程。您可以根据需要添加更多措施。在下面的简单示例中,输入可以是列表列中的数据框或四列数字。

library(tidyverse)

my_fun_1 <- function(dat, measure = c("fp_rate", "fn_rate")) {
  switch(
    measure,
    fp_rate = dat[["fp"]] / (dat[["fp"]] + dat[["tn"]]),
    fn_rate = dat[["fn"]] / (dat[["fn"]] + dat[["tp"]])
  )
}

dat1 <- dat %>%
  enframe() %>%
  rowwise() %>%
  mutate(
    fnr = my_fun_1(value, "fn_rate"),
    fpr = my_fun_1(value, "fp_rate"),
  ) %>%
  ungroup()

dat1

# # A tibble: 11 x 4
#    name  value               fnr    fpr
#    <chr> <list>            <dbl>  <dbl>
#  1 0     <tibble [1 x 4]> 0      1
#  2 0.1   <tibble [1 x 4]> 0.0294 0.765
#  3 0.2   <tibble [1 x 4]> 0.0882 0.647
# <Omitted>

my_fun_2 <- function(fn, fp, tn, tp, measure = c("fp_rate", "fn_rate")) {
  switch(measure,
    fp_rate = fp / (fp + tn),
    fn_rate = fn / (fn + tp)
  )
}

dat2 <- dat %>%
  bind_rows(.id = "quantile") %>%
  mutate(
    fnr = my_fun_2(fn, fp, tn, tp, "fn_rate"),
    fpr = my_fun_2(fn, fp, tn, tp, "fp_rate")
  )

dat2

# # A tibble: 11 x 7
#    quantile    fn    fp    tn    tp    fnr    fpr
#    <chr>    <int> <int> <int> <int>  <dbl>  <dbl>
#  1 0            0    34     0    34 0      1
#  2 0.1          1    26     8    33 0.0294 0.765
#  3 0.2          3    22    12    31 0.0882 0.647
# <Omitted>

您也可以使用以下解决方案。

  • 首先我们必须定义一个函数,它接受一个数据集和一些参数。我们明确地为我们的数据集使用 data 参数,并通过 ...
  • 捕获所有其他参数
  • 然后我们使用 enquos 函数,其中 returns 引用函数的列表来化解我们通过 ... 捕获的表达式,并强制使用大爆炸运算符 !!! 对其求值通常用于在我们的数据集 datatidy_eval 函数
  • 的上下文中拼接参数列表
  • 然后我们遍历列表中的每个元素并在评估我们想要的表达式时对每个元素应用我们的函数
library(rlang)

fn <- function(data, ...) {
  args <- enquos(...)
  
  data %>%
    mutate(out = eval_tidy(!!!args, data = data))
}

df %>%
  map_dfr(~ .x %>% fn(tp / (tp + fn)))

# A tibble: 11 x 5
      fn    fp    tn    tp   out
   <int> <int> <int> <int> <dbl>
 1     0    34     0    34 1    
 2     1    26     8    33 0.971
 3     3    22    12    31 0.912
 4     5     7    27    29 0.853
 5     5     3    31    29 0.853
 6     7     1    33    27 0.794
 7     8     0    34    26 0.765
 8     8     0    34    26 0.765
 9     8     0    34    26 0.765
10    30     0    34     4 0.118
11    34     0    34     0 0