'no applicable method' 用于将 dbplyr 的 sql_render 应用于 data.frame

'no applicable method' for applying dbplyr's sql_render to a data.frame

我正在测试 RStudio 中关于来自 dbplyr 的“渲染 SQL 代码”的示例:

library(nycflights13)
ranked <- flights %>%
  group_by(year, month, day) %>%
  select(dep_delay) %>% 
  mutate(rank = rank(desc(dep_delay)))

dbplyr::sql_render(ranked)

但是当运行时,它returns出现以下错误信息:

Error in UseMethod("sql_render") : no applicable method for 'sql_render' applied to an object of class "c('grouped_df', 'tbl_df', 'tbl', 'data.frame')"

有人可以解释为什么吗?

检查包的documentation。因此,您可以使用 SQL 语法呈现代码。

也许下面的代码块可以帮助您:

library(dplyr)
library(SqlRender)
library(nycflights13)

ranked <- flights %>%
  group_by(year, month, day) %>%
  select(dep_delay) %>% 
  mutate(rank = rank(desc(dep_delay))) %>%
  ungroup()

sql <- "SELECT * FROM @x WHERE month = @a;"
render(sql, x = ranked, a = 2)

当您处理“正常”data.frame 时,它 return 是一个框架,在这种情况下 sql_render 是不合适的(并且会很混乱)。如果我们只使用您的初始代码,那么我们可以看到 SQL 与它无关:

library(dplyr)
library(nycflights)
ranked <- flights %>%
  group_by(year, month, day) %>%
  select(dep_delay) %>% 
  mutate(rank = rank(desc(dep_delay)))
ranked
# # A tibble: 336,776 x 5
# # Groups:   year, month, day [365]
#     year month   day dep_delay  rank
#    <int> <int> <int>     <dbl> <dbl>
#  1  2013     1     1         2  313 
#  2  2013     1     1         4  276 
#  3  2013     1     1         2  313 
#  4  2013     1     1        -1  440 
#  5  2013     1     1        -6  742 
#  6  2013     1     1        -4  633 
#  7  2013     1     1        -5  691 
#  8  2013     1     1        -3  570 
#  9  2013     1     1        -3  570 
# 10  2013     1     1        -2  502.
# # ... with 336,766 more rows

但是dbplyr将无法用它做一些事情:

library(dbplyr)
sql_render(ranked)
# Error in UseMethod("sql_render") : 
#   no applicable method for 'sql_render' applied to an object of class "c('grouped_df', 'tbl_df', 'tbl', 'data.frame')"

但是,如果我们在数据库中有相同的 flights 数据,那么我们可以按照您的期望进行一些小的更改。

# pgcon <- DBI::dbConnect(odbc::odbc(), ...)     # to my local postgres instance
copy_to(pgcon, flights, name = "flights_table")  # go get some coffee

flights_db <- tbl(pgcon, "flights_table")
ranked_db <- flights_db %>%
  group_by(year, month, day) %>%
  select(dep_delay) %>% 
  mutate(rank = rank(desc(dep_delay)))
# Adding missing grouping variables: `year`, `month`, `day`

我们可以看到一些初始数据,显示查询最终将要显示的前 10 行return:

ranked_db
# # Source:   lazy query [?? x 5]
# # Database: postgres [postgres@localhost:/]
# # Groups:   year, month, day
#     year month   day dep_delay    rank
#    <int> <int> <int>     <dbl> <int64>
#  1  2013     1     1        NA       1
#  2  2013     1     1        NA       1
#  3  2013     1     1        NA       1
#  4  2013     1     1        NA       1
#  5  2013     1     1       853       5
#  6  2013     1     1       379       6
#  7  2013     1     1       290       7
#  8  2013     1     1       285       8
#  9  2013     1     1       260       9
# 10  2013     1     1       255      10
# # ... with more rows

我们可以看到真正的 SQL 查询是什么样的:

sql_render(ranked_db)
# <SQL> SELECT "year", "month", "day", "dep_delay", RANK() OVER (PARTITION BY "year", "month", "day" ORDER BY "dep_delay" DESC) AS "rank"
# FROM "flights_table"

意识到,由于 dbplyr 的操作方式,我们不知道将 return 编辑多少行,直到我们 collect 它:

nrow(ranked_db)
# [1] NA

res <- collect(ranked_db)
nrow(res)
# [1] 336776

res
# # A tibble: 336,776 x 5                 # <--- no longer 'Source:   lazy query [?? x 5]'
# # Groups:   year, month, day [365]
#     year month   day dep_delay    rank
#    <int> <int> <int>     <dbl> <int64>
#  1  2013     1     1        NA       1
#  2  2013     1     1        NA       1
#  3  2013     1     1        NA       1
#  4  2013     1     1        NA       1
#  5  2013     1     1       853       5
#  6  2013     1     1       379       6
#  7  2013     1     1       290       7
#  8  2013     1     1       285       8
#  9  2013     1     1       260       9
# 10  2013     1     1       255      10
# # ... with 336,766 more rows