使用帐户密钥从 Synapse Notebook 写入 ADLS

Writing to ADLS from Synapse Notebook with account key

我正在尝试将文件从 Azure Synapse Notebook 写入 ADLS Gen2,同时使用帐户密钥进行身份验证。

当我使用 python 和 DataLakeServiceClient 时,我可以通过密钥进行身份验证并毫无问题地写入文件。如果我尝试使用相同的 Spark 密钥进行身份验证,我会得到 java.nio.file.AccessDeniedException: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, PUT,.

使用 PySpark 并使用帐户密钥授权 [不工作]:

myaccountname = ""
account_key = ""
spark.conf.set(f"fs.azure.account.key.{myaccountname}.dfs.core.windows.net", account_key)

dest_container = "container_name"
dest_storage_name = "storage_name"
destination_storage = f"abfss://{dest_container }@{dest_storage_name }.dfs.core.windows.net"

df.write.mode("append").parquet(destination_storage + "/raw/myfile.parquet")

但我可以使用 Python 和 DataLakeServiceClient 以及 和帐户密钥的授权来编写文件 [WORKING]:

from azure.storage.filedatalake import DataLakeServiceClient

# DAP ADLS configurations
storage_name = ""
account_key = ""
container_name = ""

service_client = DataLakeServiceClient(account_url=f"https://{storage_name}.dfs.core.windows.net", credential=account_key)
file_system_client = service_client.get_file_system_client(container_name)

dir_client = file_system_client.get_directory_client(directory_name)
dir_client.create_directory()
file_client = dir_client.get_file_client(file_name)
file_client.create_file()
file_client.append_data(file_content, offset=0, length=len(file_content))
file_client.flush_data(len(file_content))

我做错了什么?我的印象是使用 spark.conf.set 作为 URL 键就足够了吗?

--更新

您能否根据您的设置仔细检查您或用户 运行 是否具有 ADLSGen2 访问权限和权限(contributer role on subscriptionStorage Blob Data Owner at the storage account levelBlob Storage Contributor Role to the service principal in the scope of the Data Lake Storage Gen2 storage account.)。

确保您拥有从 Azure 门户复制的有效帐户密钥

以防万一....

To enable other users to use the storage account after you create your workspace, you will have to perform below tasks:

  • Assign other users to the Contributor role on workspace
  • Assign other users to a Workspace, SQL, or Spark admin role using Synapse Studio
  • Assign yourself and other users to the Storage Blob Data Contributor role on the storage account

此外,如果您将 MSI 用于 Synapse 工作区,请确保您作为用户在笔记本中具有相同的权限级别。


浏览 Azure Synapse connecting to Azure storage account

上的官方 MS 文档

In case you have set up an account key and secret for the storage account, you can set forwardSparkAzureStorageCredentials to true, in which case Azure Synapse connector automatically discovers the account access key set in the notebook session configuration or the global Hadoop configuration and forwards the storage account access key to the connected Azure Synapse instance by creating a temporary Azure database scoped credential.

只需在 df.write

时添加此选项
.option("forwardSparkAzureStorageCredentials", "true")

我终于使用 LinkedService 解决了这个问题。在 LinkedService 中,我使用了 AccountKey(从 KeyVault 中检索)。

由于某些直接原因,尽管用户拥有所有必需的权限,但代码中使用帐户密钥进行的身份验证在 Synapse Notebook 中不起作用。

更新:根据微软的第三级技术支持,无法使用 Synapse 中的帐户密钥进行身份验证 (!!!) 你 使用他们的 LinkedServices。

如果其他人需要验证:

linkedServiceName_var = "my_linked_service_name"
spark.conf.set("fs.azure.account.auth.type", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedSASProvider")
spark.conf.set("spark.storage.synapse.linkedServiceName", linkedServiceName_var)

raw_container_name = "my_container"
raw_storageaccount_name = "my_storage_account"
CONNECTION_STR = f"abfs://{raw_container_name}@{raw_storageaccount_name}.dfs.core.windows.net"


my_df = spark.read.parquet(CONNECTION_STR+ "/" + filepath)