将 SQL 查询从 JDBC 连接推送到服务器,该连接从该服务器内的多个数据库读取
Push a SQL query to a server from JDBC connection which reads from multiple databases within that server
我正在将查询向下推送到服务器以将数据读入 Databricks,如下所示:
val jdbcUsername = dbutils.secrets.get(scope = "", key = "")
val jdbcPassword = dbutils.secrets.get(scope = "", key = "")
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver")
val jdbcHostname = ""
val jdbcPort = ...
val jdbcDatabase = ""
// Create the JDBC URL without passing in the user and password parameters.
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"
// Create a Properties() object to hold the parameters.
import java.util.Properties
val connectionProperties = new Properties()
connectionProperties.put("user", s"${jdbcUsername}")
connectionProperties.put("password", s"${jdbcPassword}")
val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
connectionProperties.setProperty("Driver", driverClass)
// define a query to be passed to database to display the tables available for a given DB
val query_results = "(SELECT * FROM INFORMATION_SCHEMA.TABLES) as tables"
// push the query down to the server to retrieve the list of available tables
val table_names = spark.read.jdbc(jdbcUrl, query_results, connectionProperties)
table_names.createOrReplaceTempView("table_names")
运行 display(table_names) 将为给定的已定义数据库提供表格列表。这不是问题,但是当尝试从同一服务器中的多个数据库读取和连接表时,我还没有找到有效的解决方案。
例如:
// define a query to be passed to database to display a result across many tables
val report1_results = "(SELECT a.Field1, b.Field2 FROM database_1 as a left join database_2 as b on a.Field4 == b.Field8) as report1"
// push the query down to the server to retrieve the query result
val report1_results = spark.read.jdbc(jdbcUrl, report1_results, connectionProperties)
report1_results .createOrReplaceTempView("report1_results")
任何关于重组此代码的建议(Python 中的等价物也会非常有帮助)。
SQL 服务器使用 3 部分命名,如 database.schema.table
。本例来自SQL Server information_schema
docs:
SELECT TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, COLUMN_DEFAULT
FROM AdventureWorks2012.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = N'Product';
要跨数据库查询,您需要指定查询中的所有 3 个部分被推送到 SQL 服务器。
SELECT a.Field1, b.Field2
FROM database_1.schema_1.table_1 as a
LEFT JOIN database_2.schema_2.table_2 as b
on a.Field4 == b.Field8
我正在将查询向下推送到服务器以将数据读入 Databricks,如下所示:
val jdbcUsername = dbutils.secrets.get(scope = "", key = "")
val jdbcPassword = dbutils.secrets.get(scope = "", key = "")
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver")
val jdbcHostname = ""
val jdbcPort = ...
val jdbcDatabase = ""
// Create the JDBC URL without passing in the user and password parameters.
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"
// Create a Properties() object to hold the parameters.
import java.util.Properties
val connectionProperties = new Properties()
connectionProperties.put("user", s"${jdbcUsername}")
connectionProperties.put("password", s"${jdbcPassword}")
val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
connectionProperties.setProperty("Driver", driverClass)
// define a query to be passed to database to display the tables available for a given DB
val query_results = "(SELECT * FROM INFORMATION_SCHEMA.TABLES) as tables"
// push the query down to the server to retrieve the list of available tables
val table_names = spark.read.jdbc(jdbcUrl, query_results, connectionProperties)
table_names.createOrReplaceTempView("table_names")
运行 display(table_names) 将为给定的已定义数据库提供表格列表。这不是问题,但是当尝试从同一服务器中的多个数据库读取和连接表时,我还没有找到有效的解决方案。
例如:
// define a query to be passed to database to display a result across many tables
val report1_results = "(SELECT a.Field1, b.Field2 FROM database_1 as a left join database_2 as b on a.Field4 == b.Field8) as report1"
// push the query down to the server to retrieve the query result
val report1_results = spark.read.jdbc(jdbcUrl, report1_results, connectionProperties)
report1_results .createOrReplaceTempView("report1_results")
任何关于重组此代码的建议(Python 中的等价物也会非常有帮助)。
SQL 服务器使用 3 部分命名,如 database.schema.table
。本例来自SQL Server information_schema
docs:
SELECT TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, COLUMN_DEFAULT
FROM AdventureWorks2012.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = N'Product';
要跨数据库查询,您需要指定查询中的所有 3 个部分被推送到 SQL 服务器。
SELECT a.Field1, b.Field2
FROM database_1.schema_1.table_1 as a
LEFT JOIN database_2.schema_2.table_2 as b
on a.Field4 == b.Field8