返回数据框作为主要结果，但也返回信息列表作为副作用

Question

我正在编写一个函数，我希望主要输出是一个数据框（可以通过管道传输到其他函数），但我也希望允许用户访问信息列表或示例向量从最终结果中省略。是否有关于如何解决这个问题的最佳实践，或者 functions/packages 做得很好的例子？

目前我正在探索将信息作为属性返回并发出警告，通知用户他们可以使用 attr(resulting-df, "omitted")

访问列表

如有任何建议，将不胜感激，谢谢！

library(dplyr)

iris <- iris %>%
  mutate(index = 1:nrow(.))

return_filtered <- function(df) {

  res <- filter(df, Sepal.Length > 6)
  omitted <- setdiff(iris$index, res$index)

  attr(res, "omitted") <- omitted
  return(res)

}

iris2 <- return_filtered(iris)
attributes(iris2)
#> $names
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     
#> [6] "index"       
#> 
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> [51] 51 52 53 54 55 56 57 58 59 60 61
#> 
#> $omitted
#>  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19
#> [20]  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38
#> [39]  39  40  41  42  43  44  45  46  47  48  49  50  54  56  58  60  61  62  63
#> [58]  65  67  68  70  71  79  80  81  82  83  84  85  86  89  90  91  93  94  95
#> [77]  96  97  99 100 102 107 114 115 120 122 139 143 150

^{由 reprex package (v2.0.1)}

于 2022-04-02 创建

Answer 1

这个问题可能有点 opinion-based，但我不认为它是 off-topic，因为肯定有比您当前的方法更简洁、更正式的方法来实现您想要的。

将额外信息作为属性保存是合理的，但如果您打算这样做，那么创建 S3 class 更符合习惯和可扩展性，这样您就可以隐藏属性的默认打印，确保您的属性受到保护，并为属性定义一个 getter 函数，这样用户就不必筛选多个属性来获得正确的属性。

首先，我们将调整您的函数以使用任何数据框，并允许它采用任何谓词，以便它按预期使用 dplyr::filter。我们还获得了添加到返回对象的 class 属性的函数，因此它 returns 一个继承自 data.frame

的新 S3 对象

return_filtered <- function(df, predicate) {
  predicate    <- rlang::enquo(predicate)
  df$`..id..`  <- seq(nrow(df))
  res          <- dplyr::filter(df, !!predicate)
  filtered     <- setdiff(seq(nrow(df)), res$`..id..`)
  res$`..id..` <- NULL
  
  attr(res, "filtered") <- filtered
  class(res)            <- c("filtered", class(df))
  
  return(res)
}

我们将定义一个打印方法，以便在打印对象时不显示属性：

print.filtered <- function(x, ...) {
  class(x) <- class(x)[class(x) != "filtered"]
  print(x, ...)
}

要从属性中获取 filtered-out 数据，我们可以创建一个新的通用函数，该函数仅适用于我们的新 class:

get_filtered <- function(x) UseMethod("get_filtered")

get_filtered.default <- function(x) {
  stop("'get_filtered' only works on filtered objects")
}

get_filtered.filtered <- function(x) {
  attr(x, "filtered")
}

所以现在，当我们调用 return_filtered 时，它似乎与 dplyr::filter 一样工作，返回看似正常的数据帧：

df <- return_filtered(iris, Sepal.Length > 7)

df
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#> 1           7.1         3.0          5.9         2.1 virginica
#> 2           7.6         3.0          6.6         2.1 virginica
#> 3           7.3         2.9          6.3         1.8 virginica
#> 4           7.2         3.6          6.1         2.5 virginica
#> 5           7.7         3.8          6.7         2.2 virginica
#> 6           7.7         2.6          6.9         2.3 virginica
#> 7           7.7         2.8          6.7         2.0 virginica
#> 8           7.2         3.2          6.0         1.8 virginica
#> 9           7.2         3.0          5.8         1.6 virginica
#> 10          7.4         2.8          6.1         1.9 virginica
#> 11          7.9         3.8          6.4         2.0 virginica
#> 12          7.7         3.0          6.1         2.3 virginica

但是我们可以使用 get_filtered 函数从中获取 filtered-out 数据。

get_filtered(df)
#>   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
#>  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
#>  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
#>  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
#>  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
#>  [91]  91  92  93  94  95  96  97  98  99 100 101 102 104 105 107 109 111 112
#> [109] 113 114 115 116 117 120 121 122 124 125 127 128 129 133 134 135 137 138
#> [127] 139 140 141 142 143 144 145 146 147 148 149 150

并在 non-filtered 数据帧上调用 get_filtered returns 一个信息性错误：

get_filtered(iris)
#> Error in get_filtered.default(iris): 'get_filtered' only works on filtered objects

^{由 reprex package (v2.0.1)}

创建于 2022-04-02

返回数据框作为主要结果，但也返回信息列表作为副作用

Returning data frame as main result but also informative list as side effect

r

function

package-development

tidyverse