无法使用带有主密钥错误的 Databricks 从 Azure 突触读取

Unable to read from Azure synapse using Databricks with master key error

我正在尝试使用提供的教程从 Azure Synapse DWH 池中读取数据帧 https://docs.databricks.com/data/data-sources/azure/synapse-analytics.html

我已经设置了存储帐户访问密钥“fs.azure.account.key..blob.core.windows.net”并且还以 abfss 格式指定了 ADLS 的临时目录。

读取操作的语法为:

x=spark.read.format('com.databricks.spark.sqldw').option('url',sqlDwUrl).option('tempDir',tempdir).option('forwardSparkAzureStorageCredentials', 'true').option('query',"SELECT TOP(1)* FROM "+ targetSchema + '.' + targetTable).load()

以上执行正常。

然后我尝试使用

显示数据框
display(x)

,我运行进入如下错误

SqlDWSideException: Azure Synapse Analytics failed to execute the JDBC query produced by the connector.
Underlying SQLException(s):
  - com.microsoft.sqlserver.jdbc.SQLServerException: Please create a master key in the database or open the master key in the session before performing this operation. [ErrorCode = 15581] [SQLState = S0006]

根据文档,我了解到需要一个数据库主密钥,并且已经正式创建。因此,我不确定为什么会抛出此错误。

令人惊讶的是,使用格式将操作写入 Synapse(df.write .format("com.databricks.spark.sqldw").....) 很有魅力。

我做了一些研究,基于此,我觉得数据库主密钥(这是由 DBA 创建的)对于读取和写入操作都是有效的。 数据库主密钥有什么方法可以限制读取操作,但不能限制写入操作? 如果不是,那为什么会出现上述问题?

从 Azure 门户创建 SQL 池后,您必须先创建主密钥。您可以通过 SSMS 和 运行 T-SQL 命令连接来完成此操作。如果您现在尝试读取此池中的 table,您将不会在数据块中看到任何错误。

浏览这些文档,Required Azure Synapse permissions for PolyBase

As a prerequisite for the first command, the connector expects that a database master key already exists for the specified Azure Synapse instance. If not, you can create a key using the CREATE MASTER KEY command.

下一个..

Is there any way by which a database master key would restrict read operations, but not write? If not, then why could the above issue be occuring?

如果您注意到,在写入 SQL 时,您已经在存储帐户中配置了临时目录。 Azure Synapse 连接器自动发现帐户访问密钥集,并通过创建临时 Azure database scoped credential

将其转发到连接的 Azure Synapse 实例

Creates a database credential. A database credential is not mapped to a server login or database user. The credential is used by the database to access to the external location anytime the database is performing an operation that requires access.

从这里开始 Open the Database Master Key of the current database

If the database master key was encrypted with the service master key, it will be automatically opened when it is needed for decryption or encryption. In this case, it is not necessary to use the OPEN MASTER KEY statement.

When a database is first attached or restored to a new instance of SQL Server, a copy of the database master key (encrypted by the service master key) is not yet stored in the server. You must use the OPEN MASTER KEY statement to decrypt the database master key (DMK). Once the DMK has been decrypted, you have the option of enabling automatic decryption in the future by using the ALTER MASTER KEY REGENERATE statement to provision the server with a copy of the DMK, encrypted with the service master key (SMK).

但是...从

For SQL Database and Azure Synapse Analytics, the password protection is not considered to be a safety mechanism to prevent a data loss scenario in situations where the database may be moved from one server to another, as the Service Master Key protection on the Master Key is managed by Microsoft Azure platform. Therefore, the Master Key password is optional in SQL Database and Azure Synapse Analytics.

正如您从上面看到的,我尝试重现,是的,在您首先从 Synapse Portal 创建一个 SQL POOL 之后,您可以直接从数据块写入 table,但是当您尝试阅读相同的内容,您会遇到异常。

Spark 正在将数据作为 parquet 文件写入公共 blob 存储,随后突触使用 COPY 语句将这些加载到给定 table。当从 synapse 专用 SQL 池 table 读取数据时,Synapse 正在将数据从专用 sql 池写入公共 blob 存储作为 parquet 文件,使用 snappy 压缩,然后由 Spark 读取和显示给你。

我们只是在会话的配置中设置 blob 存储帐户密钥和机密。并使用 forwardSparkAzureStorageCredentials = true Synapse 连接器通过创建 Azure 数据库范围的凭据将存储访问密钥转发到 Azure Synapse 专用池。

Note: You can .load() into data frame without exception but when you try and use display(dataframe) the exception pops.

现在考虑是否存在 MASTER KEY,连接到您的 sql 池数据库,您可以尝试以下操作,

示例:Azure Synapse Analytics

OPEN MASTER KEY DECRYPTION BY PASSWORD = 'Your-DB-PASS';  
GO  
CLOSE MASTER KEY;  
GO

如果出现此错误:

Please create a master key in the database or open the master key in the session before performing this operation.

只需创建主密钥或使用更改主密钥:

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'ljlLKJjs@l23je'

ALTER MASTER KEY REGENERATE WITH ENCRYPTION BY PASSWORD = 'ljlLKJjs@l23je';