如何在函数中将 "everything possible" 传递给 by?

How to pass "everything possible" to by in a function?

我正在尝试在我正在处理的包中面向用户的功能中使用 data.table。我希望这个函数尽可能表现得像 data.table 一样。这意味着例如我的函数还具有一个 by 参数,该参数被传递给函数内的底层 data.table 调用。用户应该可以自由地将任何东西传递给 "my" by,这在 data.table 中是可能的。

引用自 ?data.table 这包括:

  1. A single unquoted column name: e.g., DT[, .(sa=sum(a)), by=x]
  2. a list() of expressions of column names: e.g., DT[, .(sa=sum(a)), by=.(x=x>0, y)]
  3. a single character string containing comma separated column names (where spaces are significant since column names may contain spaces even at the start or end): e.g., DT[, sum(a), by="x,y,z"]
  4. a character vector of column names: e.g., DT[, sum(a), by=c("x", "y")]
  5. or of the form startcol:endcol: e.g., DT[, sum(a), by=x:z]

这是一个最小的(部分)工作示例,以阐明我的意图:

library(data.table)
#> Warning: package 'data.table' was built under R version 3.6.2
sample_dt <- data.table(a = 1:5, b = 5:1)

count_by <- function(dt, by = NULL) {
    by <- substitute(by)
    dt[, .N, by = eval(by, dt, parent.frame())]
}

count_by(sample_dt)               
#>    N
#> 1: 5
count_by(sample_dt, by = a)       # refers to 1 from the list above
#>    by N
#> 1:  1 1
#> 2:  2 1
#> 3:  3 1
#> 4:  4 1
#> 5:  5 1
count_by(sample_dt, by = list(a)) # refers to 2 from the list above
#>    a N
#> 1: 1 1
#> 2: 2 1
#> 3: 3 1
#> 4: 4 1
#> 5: 5 1
count_by(sample_dt, by = "a")     # refers to 3 from the list above
#>    a N
#> 1: 1 1
#> 2: 2 1
#> 3: 3 1
#> 4: 4 1
#> 5: 5 1
count_by(sample_dt, by = c("a"))  # refers to 4 from the list above
#> Error in `[.data.table`(dt, , .N, by = eval(by, dt, parent.frame())): 'by' appears to evaluate to column names but isn't c() or key(). Use by=list(...) if you can. Otherwise, by=evalc("a") should work. This is for efficiency so data.table can detect which columns are needed.
count_by(sample_dt, by = a:b)     # refers to 5 from the list above
#>    a b N
#> 1: 1 5 1
#> 2: 2 4 1
#> 3: 3 3 1
#> 4: 4 2 1
#> 5: 5 1 1

reprex package (v0.3.0)

于 2020 年 2 月 18 日创建

除了案例 4 之外,在适当的上下文中使用简单的替换和评估,一切都按预期工作。所以我的问题是:

如何创建在内部使用 data.table 并完全模仿原始 by 用户界面的函数?


Session 信息

devtools::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  German_Germany.1252         
#>  ctype    German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2020-02-18                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.2)
#>  backports     1.1.5   2019-10-02 [1] CRAN (R 3.6.1)
#>  callr         3.4.1   2020-01-24 [1] CRAN (R 3.6.2)
#>  cli           2.0.1   2020-01-08 [1] CRAN (R 3.6.2)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.2)
#>  data.table  * 1.12.8  2019-12-09 [1] CRAN (R 3.6.2)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.2)
#>  devtools      2.2.1   2019-09-24 [1] CRAN (R 3.6.2)
#>  digest        0.6.23  2019-11-23 [1] CRAN (R 3.6.2)
#>  ellipsis      0.3.0   2019-09-20 [1] CRAN (R 3.6.2)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.2)
#>  fansi         0.4.1   2020-01-08 [1] CRAN (R 3.6.2)
#>  fs            1.3.1   2019-05-06 [1] CRAN (R 3.6.2)
#>  glue          1.3.1   2019-03-12 [1] CRAN (R 3.6.2)
#>  highr         0.8     2019-03-20 [1] CRAN (R 3.6.2)
#>  htmltools     0.4.0   2019-10-04 [1] CRAN (R 3.6.2)
#>  knitr         1.27    2020-01-16 [1] CRAN (R 3.6.2)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.2)
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.2)
#>  pkgbuild      1.0.6   2019-10-09 [1] CRAN (R 3.6.2)
#>  pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.6.2)
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 3.6.2)
#>  processx      3.4.1   2019-07-18 [1] CRAN (R 3.6.2)
#>  ps            1.3.0   2018-12-21 [1] CRAN (R 3.6.2)
#>  R6            2.4.1   2019-11-12 [1] CRAN (R 3.6.2)
#>  Rcpp          1.0.3   2019-11-08 [1] CRAN (R 3.6.2)
#>  remotes       2.1.0   2019-06-24 [1] CRAN (R 3.6.2)
#>  rlang         0.4.4   2020-01-28 [1] CRAN (R 3.6.2)
#>  rmarkdown     2.1     2020-01-20 [1] CRAN (R 3.6.2)
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.2)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.2)
#>  stringi       1.4.4   2020-01-09 [1] CRAN (R 3.6.2)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.2)
#>  testthat      2.3.1   2019-12-01 [1] CRAN (R 3.6.2)
#>  usethis       1.5.1   2019-07-04 [1] CRAN (R 3.6.2)
#>  withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.2)
#>  xfun          0.12    2020-01-13 [1] CRAN (R 3.6.2)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 3.6.2)
#> 
#> [1] C:/Program Files/R/library

在 data.table 中使用 eval 是否有特殊原因?我认为这样会更好:

count_by <- function(dt, by = NULL) {
  eval(substitute(dt[, .N, by = by]))
}

它通过了所有测试用例(当然)。即使是第一个,您的函数因列名 by.

而失败