从 DataBricks 创建 Polybase 表

Create Polybase tables from DataBricks

作为 Datawarehouse 的新手,我有一个新要求,即从 Datalake(GEN1/GEN2) 创建 EXTERNAL TABLE 到 Databricks 的 DWH 中。我使用 link 创建以下代码。

// Set up the Blob storage account access key in the notebook session conf.
spark.conf.set(
  "fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
  "<your-storage-account-access-key>")

// Get some data from a SQL DW table.
val df: DataFrame = spark.read
  .format("com.databricks.spark.sqldw")
  .option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>")
  .option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>")
  .option("forwardSparkAzureStorageCredentials", "true")
  .option("dbTable", "my_table_in_dw")
  .load()

我写的代码

%scala

Class.forName("com.databricks.spark.sqldw.DefaultSource")

import org.apache.spark.sql.functions._ 
import org.apache.spark.sql.{DataFrame, SQLContext}


spark.conf.set("fs.azure.account.key.xxxxxxxxx.blob.core.windows.net", "xxxxxxxxxxxxxxx")

    // Load data from a SQL DW query
   val df: DataFrame = spark.read
  .format("com.databricks.spark.sqldw")
  .option("url", "jdbc:sqlserver://xxxxxxxxxxx.database.windows.net:1433;database=xxxxxxxx")
  .option("tempDir", "wasbs://xxxxxxxxx@xxxxxxxxx.blob.core.windows.net")
  .option("forwardSparkAzureStorageCredentials", "true")
  .option("dbTable", "dbo.EXT_TEST") 
  .load()

这是一个错误: com.databricks.spark.sqldw.SqlDWConnectorException: SQL DW 连接器代码中遇到异常。 我哪里错了?如有任何帮助,我们将不胜感激。

确保按以下格式传递 "tempDir"。

tempDir = "wasbs://" + blobContainer + "@" + blobStorage +"/tempDirs"

参考: Load data into Azure SQL Data Warehouse

您可以参考 GitHub 问题中概述的建议,它解决了类似的问题。

希望对您有所帮助。