使用 R 访问时如何避免与 postgres 数据库建立多个连接

How to avoid making multiple connections to postgres database when accessing using R

我正在使用以下代码,但是它在调用 map 函数时创建了多个连接并且它们没有关闭。结果,我的 rds 数据库充满了连接。有什么方法可以更改此代码以防止连接过多?

   connect.to.database <- function (dbname, schema = "public", host, port, user, pass) {
      con <- dbConnect(RPostgres::Postgres(),
                       dbname = dbname,
                       user = user,
                       password = pass,
                       host = host,
                       port = port)
      
      
      # this puts the schema in the search path, which means that instead of
      # having to use <schema name>.<table name> you can just write <table name>
      res <- dbSendQuery(con, paste0("SET search_path TO ",
                                     dbQuoteIdentifier(con, schema),
                                     ", public"))
      
      # check for errors
      dbFetch(res)
      dbClearResult(res)
      
      con
    }

    schemas <- dbGetQuery(connect.to.database(dbname, "public", host, port, user, password), paste0("SELECT schema_name FROM information_schema.schemata"))
    
    schema_names <- schemas %>% pull()
    
    schemas_tables <- map(.x = schema_names,~dbGetQuery(connect.to.database(dbname, "public", host, port, user, password), paste0("SELECT table_name FROM information_schema.tables WHERE table_schema = ","'",.x,"'")) %>% mutate(schema_name = .x)) %>%
                      bind_rows()

创建单个全局连接对象并在 map 中使用 。 (我从您的第一个查询中删除了不必要的 paste0。)

conn <- connect.to.database(dbname, "public", host, port, user, password)
schema <- dbGetQuery(conn, "SELECT schema_name FROM information_schema.schemata")

schemas_tables <- map(
  .x = schema$schema_name,
  ~ dbGetQuery(conn, paste0("SELECT table_name FROM information_schema.tables WHERE table_schema = ","'",.x,"'")) %>%
    mutate(schema_name = .x)
) %>%
  bind_rows()

您可能需要考虑参数化查询,而不是手动构造查询字符串。虽然存在关于恶意 SQL injection (e.g., XKCD's Exploits of a Mom aka "Little Bobby Tables"), it is also a concern for malformed strings or Unicode-vs-ANSI mistakes, even if it's a single data analyst running the query. Both DBI (with odbc) and RODBC support parameterized queries 的安全问题,无论是本机还是通过加载项。

这会将其更改为:

schemas_tables <- map(
  .x = schema$schema_name,
  ~ dbGetQuery(conn, "SELECT table_name FROM information_schema.tables WHERE table_schema = ?",
               params = list(.x)) %>%
    mutate(schema_name = .x)
) %>%
  bind_rows()

但坦率地说,我认为使用 IN 可能比 = 更容易。同样,使用参数绑定。

schemas_tables <- dbGetQuery(conn, "SELECT table_name FROM information_schema.tables WHERE table_schema IN (?)",
                             params = list(schema$schema_name))

(不需要 map。)

或者我相信您可以在一次查询中完成,而不是两次。

dbGetQuery(conn, "
    select table_name
    from information_schema.tables
    where table_schema in (
      select schema_name from information_schema.schemata
    )")

记得

...完成后关闭连接。

dbDisconnect(conn)