使用 R 访问时如何避免与 postgres 数据库建立多个连接
How to avoid making multiple connections to postgres database when accessing using R
我正在使用以下代码,但是它在调用 map 函数时创建了多个连接并且它们没有关闭。结果,我的 rds 数据库充满了连接。有什么方法可以更改此代码以防止连接过多?
connect.to.database <- function (dbname, schema = "public", host, port, user, pass) {
con <- dbConnect(RPostgres::Postgres(),
dbname = dbname,
user = user,
password = pass,
host = host,
port = port)
# this puts the schema in the search path, which means that instead of
# having to use <schema name>.<table name> you can just write <table name>
res <- dbSendQuery(con, paste0("SET search_path TO ",
dbQuoteIdentifier(con, schema),
", public"))
# check for errors
dbFetch(res)
dbClearResult(res)
con
}
schemas <- dbGetQuery(connect.to.database(dbname, "public", host, port, user, password), paste0("SELECT schema_name FROM information_schema.schemata"))
schema_names <- schemas %>% pull()
schemas_tables <- map(.x = schema_names,~dbGetQuery(connect.to.database(dbname, "public", host, port, user, password), paste0("SELECT table_name FROM information_schema.tables WHERE table_schema = ","'",.x,"'")) %>% mutate(schema_name = .x)) %>%
bind_rows()
创建单个全局连接对象并在 map
中使用 它。 (我从您的第一个查询中删除了不必要的 paste0
。)
conn <- connect.to.database(dbname, "public", host, port, user, password)
schema <- dbGetQuery(conn, "SELECT schema_name FROM information_schema.schemata")
schemas_tables <- map(
.x = schema$schema_name,
~ dbGetQuery(conn, paste0("SELECT table_name FROM information_schema.tables WHERE table_schema = ","'",.x,"'")) %>%
mutate(schema_name = .x)
) %>%
bind_rows()
您可能需要考虑参数化查询,而不是手动构造查询字符串。虽然存在关于恶意 SQL injection (e.g., XKCD's Exploits of a Mom aka "Little Bobby Tables"), it is also a concern for malformed strings or Unicode-vs-ANSI mistakes, even if it's a single data analyst running the query. Both DBI
(with odbc
) and RODBC
support parameterized queries 的安全问题,无论是本机还是通过加载项。
这会将其更改为:
schemas_tables <- map(
.x = schema$schema_name,
~ dbGetQuery(conn, "SELECT table_name FROM information_schema.tables WHERE table_schema = ?",
params = list(.x)) %>%
mutate(schema_name = .x)
) %>%
bind_rows()
但坦率地说,我认为使用 IN
可能比 =
更容易。同样,使用参数绑定。
schemas_tables <- dbGetQuery(conn, "SELECT table_name FROM information_schema.tables WHERE table_schema IN (?)",
params = list(schema$schema_name))
(不需要 map
。)
或者我相信您可以在一次查询中完成,而不是两次。
dbGetQuery(conn, "
select table_name
from information_schema.tables
where table_schema in (
select schema_name from information_schema.schemata
)")
记得
...完成后关闭连接。
dbDisconnect(conn)
我正在使用以下代码,但是它在调用 map 函数时创建了多个连接并且它们没有关闭。结果,我的 rds 数据库充满了连接。有什么方法可以更改此代码以防止连接过多?
connect.to.database <- function (dbname, schema = "public", host, port, user, pass) {
con <- dbConnect(RPostgres::Postgres(),
dbname = dbname,
user = user,
password = pass,
host = host,
port = port)
# this puts the schema in the search path, which means that instead of
# having to use <schema name>.<table name> you can just write <table name>
res <- dbSendQuery(con, paste0("SET search_path TO ",
dbQuoteIdentifier(con, schema),
", public"))
# check for errors
dbFetch(res)
dbClearResult(res)
con
}
schemas <- dbGetQuery(connect.to.database(dbname, "public", host, port, user, password), paste0("SELECT schema_name FROM information_schema.schemata"))
schema_names <- schemas %>% pull()
schemas_tables <- map(.x = schema_names,~dbGetQuery(connect.to.database(dbname, "public", host, port, user, password), paste0("SELECT table_name FROM information_schema.tables WHERE table_schema = ","'",.x,"'")) %>% mutate(schema_name = .x)) %>%
bind_rows()
创建单个全局连接对象并在 map
中使用 它。 (我从您的第一个查询中删除了不必要的 paste0
。)
conn <- connect.to.database(dbname, "public", host, port, user, password)
schema <- dbGetQuery(conn, "SELECT schema_name FROM information_schema.schemata")
schemas_tables <- map(
.x = schema$schema_name,
~ dbGetQuery(conn, paste0("SELECT table_name FROM information_schema.tables WHERE table_schema = ","'",.x,"'")) %>%
mutate(schema_name = .x)
) %>%
bind_rows()
您可能需要考虑参数化查询,而不是手动构造查询字符串。虽然存在关于恶意 SQL injection (e.g., XKCD's Exploits of a Mom aka "Little Bobby Tables"), it is also a concern for malformed strings or Unicode-vs-ANSI mistakes, even if it's a single data analyst running the query. Both DBI
(with odbc
) and RODBC
support parameterized queries 的安全问题,无论是本机还是通过加载项。
这会将其更改为:
schemas_tables <- map(
.x = schema$schema_name,
~ dbGetQuery(conn, "SELECT table_name FROM information_schema.tables WHERE table_schema = ?",
params = list(.x)) %>%
mutate(schema_name = .x)
) %>%
bind_rows()
但坦率地说,我认为使用 IN
可能比 =
更容易。同样,使用参数绑定。
schemas_tables <- dbGetQuery(conn, "SELECT table_name FROM information_schema.tables WHERE table_schema IN (?)",
params = list(schema$schema_name))
(不需要 map
。)
或者我相信您可以在一次查询中完成,而不是两次。
dbGetQuery(conn, "
select table_name
from information_schema.tables
where table_schema in (
select schema_name from information_schema.schemata
)")
记得
...完成后关闭连接。
dbDisconnect(conn)