将 SQL 查询从 JDBC 连接推送到服务器,该连接从该服务器内的多个数据库读取

Push a SQL query to a server from JDBC connection which reads from multiple databases within that server

我正在将查询向下推送到服务器以将数据读入 Databricks,如下所示:

val jdbcUsername = dbutils.secrets.get(scope = "", key = "")
val jdbcPassword = dbutils.secrets.get(scope = "", key = "")
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver")


val jdbcHostname = "" 
val jdbcPort = ...
val jdbcDatabase = ""

// Create the JDBC URL without passing in the user and password parameters.
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"

// Create a Properties() object to hold the parameters.
import java.util.Properties
val connectionProperties = new Properties()

connectionProperties.put("user", s"${jdbcUsername}")
connectionProperties.put("password", s"${jdbcPassword}")

val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
connectionProperties.setProperty("Driver", driverClass)


// define a query to be passed to database to display the tables available for a given DB
val query_results = "(SELECT * FROM INFORMATION_SCHEMA.TABLES) as tables"

// push the query down to the server to retrieve the list of available tables
val table_names = spark.read.jdbc(jdbcUrl, query_results, connectionProperties)
table_names.createOrReplaceTempView("table_names")

运行 display(table_names) 将为给定的已定义数据库提供表格列表。这不是问题,但是当尝试从同一服务器中的多个数据库读取和连接表时,我还没有找到有效的解决方案。

例如:

// define a query to be passed to database to display a result across many tables
val report1_results = "(SELECT a.Field1, b.Field2 FROM database_1 as a left join database_2 as b on a.Field4 == b.Field8) as report1"

// push the query down to the server to retrieve the query result
val report1_results = spark.read.jdbc(jdbcUrl, report1_results, connectionProperties)
report1_results .createOrReplaceTempView("report1_results")

任何关于重组此代码的建议(Python 中的等价物也会非常有帮助)。

SQL 服务器使用 3 部分命名,如 database.schema.table。本例来自SQL Server information_schema docs:

SELECT TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, COLUMN_DEFAULT
FROM AdventureWorks2012.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = N'Product';

要跨数据库查询,您需要指定查询中的所有 3 个部分被推送到 SQL 服务器。

SELECT a.Field1, b.Field2 
FROM      database_1.schema_1.table_1 as a 
LEFT JOIN database_2.schema_2.table_2 as b 
       on a.Field4 == b.Field8