dbplyr、dplyr 和没有 SQL 等价物的函数 [例如 `slice()`]

Question

library(tidyverse)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, mtcars)
mtcars2 <- tbl(con, "mtcars")

我可以在上面创建这个模拟 SQL 数据库。我可以在此 "database":

上执行标准的 dplyr 函数，这非常酷

mtcars2 %>% 
  group_by(cyl) %>% 
  summarise(mpg = mean(mpg, na.rm = TRUE)) %>% 
  arrange(desc(mpg))
#> # Source:     lazy query [?? x 2]
#> # Database:   sqlite 3.29.0 [:memory:]
#> # Ordered by: desc(mpg)
#>     cyl   mpg
#>   <dbl> <dbl>
#> 1     4  26.7
#> 2     6  19.7
#> 3     8  15.1

看来我无法使用没有直接 SQL 等价物的 dplyr 函数（例如 dplyr::slice()）。在 slice() 的情况下，我可以使用 filter() 和 row_number() 的替代组合来获得与仅使用 slice() 相同的结果。但是，如果没有这么简单的解决方法会怎样？

mtcars2 %>% slice(1:5)
#>Error in UseMethod("slice_") : 
#>  no applicable method for 'slice_' applied to an object of class 
#>  "c('tbl_SQLiteConnection', 'tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"

当 dplyr 函数没有直接 SQL 等价物时，我可以强制将它们与 dbplyr 一起使用，或者是使用具有 SQL 等价物的 dplyr 动词获得创意的唯一选择，或者只写SQL 直接（这不是 my 首选解决方案）?

Answer 1

我理解了这个问题：如何让 slice() 为 SQL 数据库工作？这与 "forcing their use" 不同，但仍然适用于您的情况。

下面的示例显示了如何实现 slice() 的 "poor man's" 变体，该变体适用于数据库。我们仍然需要做跑腿工作，并使用在数据库上工作的动词来实现它，但是我们可以像使用数据框一样使用它。

阅读 http://adv-r.had.co.nz/OO-essentials.html#s3 中有关 S3 类的更多信息。

library(tidyverse)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, mtcars)
mtcars2 <- tbl(con, "mtcars")

# mtcars2 has a class attribute
class(mtcars2)
#> [1] "tbl_SQLiteConnection" "tbl_dbi"              "tbl_sql"             
#> [4] "tbl_lazy"             "tbl"

# slice() is an S3 method
slice
#> function(.data, ..., .preserve = FALSE) {
#>   UseMethod("slice")
#> }
#> <bytecode: 0x560a03460548>
#> <environment: namespace:dplyr>

# we can implement a "poor man's" variant of slice()
# for the particular class. (It doesn't work quite the same
# in all cases.)
#' @export
slice.tbl_sql <- function(.data, ...) {
  rows <- c(...)

  .data %>%
    mutate(...row_id = row_number()) %>%
    filter(...row_id %in% !!rows) %>%
    select(-...row_id)
}

mtcars2 %>%
  slice(1:5)
#> # Source:   lazy query [?? x 11]
#> # Database: sqlite 3.29.0 [:memory:]
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  21       6   160   110  3.9   2.62  16.5     0     1     4     4
#> 2  21       6   160   110  3.9   2.88  17.0     0     1     4     4
#> 3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1
#> 4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
#> 5  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2

^{由 reprex package (v0.3.0)}

于 2019-12-07 创建

dbplyr、dplyr 和没有 SQL 等价物的函数 [例如 `slice()`]

dbplyr, dplyr, and functions with no SQL equivalents [eg `slice()`]

sql

r

dplyr

r-dbi

dbplyr