使用 Synapse Analytics 将数据框写入 SQL 专用数据库
Write dataframe to SQL dedicated database using Synapse Analytics
我想从我的 Azure Data Lake Storage Gen2 加载数据帧并将其写入 SQL
我在 Synapse 中创建的专用数据库。
这是我所做的:
df = spark.read.format("delta").load(BronzePath)
df.write.format("com.databricks.spark.sqldw").option("url", jdbcUrl).save()
我有以下错误:
java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.sqldw.
正在做:
df.write.mode("overwrite").saveAsTable("MyTable")
在 Spark 默认数据库(蓝色十字)中创建 table。那不是我需要的。我想将我的 table 放入专用数据库(蓝色箭头):
Post 更多代码,包括 jdbc url,如果它不同于 this guide。我没有在 conf 中看到设置存储密钥的代码,你似乎也在使用不同的方式来保存。
# Otherwise, set up the Blob storage account access key in the notebook session conf.
spark.conf.set(
"fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
"<your-storage-account-access-key>")
# Get some data from an Azure Synapse table.
df = spark.read \
.format("com.databricks.spark.sqldw") \
.option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>") \
.option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \
.option("forwardSparkAzureStorageCredentials", "true") \
.option("dbTable", "<your-table-name>") \
.load()
# Load data from an Azure Synapse query.
df = spark.read \
.format("com.databricks.spark.sqldw") \
.option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>") \
.option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \
.option("forwardSparkAzureStorageCredentials", "true") \
.option("query", "select x, count(*) as cnt from table group by x") \
.load()
# Apply some transformations to the data, then use the
# Data Source API to write the data back to another table in Azure Synapse.
df.write \
.format("com.databricks.spark.sqldw") \
.option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>") \
.option("forwardSparkAzureStorageCredentials", "true") \
.option("dbTable", "<your-table-name>") \
.option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \
.save()
另读
- Supported save modes for batch writes 和
- Write semantics
在常见问题解答中。
creates the table in Spark default database (blue cross). That is not what I need. I want to have my table in the dedicated database (blue arrow):
如所述here “Spark 将为您创建一个默认的本地 Hive Metastore(使用 Derby)。”
所以当你不给它一个 path/jdbcurl (df.write.mode("overwrite").saveAsTable("MyTable")
) 时,它会保存到本地 Hive。
我想从我的 Azure Data Lake Storage Gen2 加载数据帧并将其写入 SQL 我在 Synapse 中创建的专用数据库。
这是我所做的:
df = spark.read.format("delta").load(BronzePath)
df.write.format("com.databricks.spark.sqldw").option("url", jdbcUrl).save()
我有以下错误:
java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.sqldw.
正在做:
df.write.mode("overwrite").saveAsTable("MyTable")
在 Spark 默认数据库(蓝色十字)中创建 table。那不是我需要的。我想将我的 table 放入专用数据库(蓝色箭头):
Post 更多代码,包括 jdbc url,如果它不同于 this guide。我没有在 conf 中看到设置存储密钥的代码,你似乎也在使用不同的方式来保存。
# Otherwise, set up the Blob storage account access key in the notebook session conf.
spark.conf.set(
"fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
"<your-storage-account-access-key>")
# Get some data from an Azure Synapse table.
df = spark.read \
.format("com.databricks.spark.sqldw") \
.option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>") \
.option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \
.option("forwardSparkAzureStorageCredentials", "true") \
.option("dbTable", "<your-table-name>") \
.load()
# Load data from an Azure Synapse query.
df = spark.read \
.format("com.databricks.spark.sqldw") \
.option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>") \
.option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \
.option("forwardSparkAzureStorageCredentials", "true") \
.option("query", "select x, count(*) as cnt from table group by x") \
.load()
# Apply some transformations to the data, then use the
# Data Source API to write the data back to another table in Azure Synapse.
df.write \
.format("com.databricks.spark.sqldw") \
.option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>") \
.option("forwardSparkAzureStorageCredentials", "true") \
.option("dbTable", "<your-table-name>") \
.option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \
.save()
另读
- Supported save modes for batch writes 和
- Write semantics 在常见问题解答中。
creates the table in Spark default database (blue cross). That is not what I need. I want to have my table in the dedicated database (blue arrow):
如所述here “Spark 将为您创建一个默认的本地 Hive Metastore(使用 Derby)。”
所以当你不给它一个 path/jdbcurl (df.write.mode("overwrite").saveAsTable("MyTable")
) 时,它会保存到本地 Hive。