更新服务主体后,Azure Databricks 作业无法访问 ADLS 存储

Azure Databricks job fails to access ADLS storage after renewing service principal

用于连接到 ADLS G2 存储并成功处理文件的 Databricks 作业。

最近在更新服务主体机密并更新 Key-vault 中的机密后,现在作业失败了。

使用 databricks-cli databricks secrets list-scopes --profile mycluster,我能够识别正在使用的密钥值,还验证了相应的秘密是否已正确更新。

在笔记本中,我关注 link 并且能够访问 ALDS

下面我用来测试密钥保管库值,以访问 ADLS。

scopename="name-of-the-scope-used-in-databricks-workspace"

appId=dbutils.secrets.get(scope=scopename,key="name-of-the-key-from-keyvault-referring-appid")
directoryId=dbutils.secrets.get(scope=scopename,key="name-of-key-from-keyvault-referring-TenantId")
secretValue=dbutils.secrets.get(scope=scopename,key="name-of-key-from-keyvaut-referring-Secretkey")
storageAccount="ADLS-Gen2-StorageAccountName"

spark.conf.set(f"fs.azure.account.auth.type.{storageAccount}.dfs.core.windows.net", "OAuth")
spark.conf.set(f"fs.azure.account.oauth.provider.type.{storageAccount}.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(f"fs.azure.account.oauth2.client.id.{storageAccount}.dfs.core.windows.net", appid)
spark.conf.set(f"fs.azure.account.oauth2.client.secret.{storageAccount}.dfs.core.windows.net", secretValue)
spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{storageAccount}.dfs.core.windows.net", f"https://login.microsoftonline.com/{directoryid}/oauth2/token")
dbutils.fs.ls("abfss://<container-name>@<storage-accnt-name>.dfs.core.windows.net/<folder>")

在附加集群的情况下,上面成功显示了 ADLS G2 存储中 folders/files 的列表。

用于创建挂载点的代码,它使用了旧的机密信息。

scope_name="name-of-the-scope-from-workspace"
directoryId=dbutils.secrets.get(scope=scope_name, key="name-of-key-from-keyvault-which-stores-tenantid-value")
configs = {"fs.azure.account.auth.type": "OAuth",
          "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
          "fs.azure.account.oauth2.client.id": dbutils.secrets.get(scope=scope_name, key="name-of-key-from-key-vault-referring-to-clientid"),
          "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope=scope_name, key="name-of-key-from-key-vault-referring-to-secretvalue-generated-in-sp-secrets"),
          "fs.azure.account.oauth2.client.endpoint": f"https://login.microsoftonline.com/{directoryId}/oauth2/token"}

storage_acct_name="storageaccountname"
container_name="name-of-container"

mount_point = "/mnt/appadls/content"
if not any(mount.mountPoint == mount_point for mount in dbutils.fs.mounts()):
  print(f"Mounting {mount_point} to DBFS filesystem")
  dbutils.fs.mount(
    source = f"abfss://{container_name}@{storage_acct_name}.dfs.core.windows.net/",
    mount_point = mount_point,
    extra_configs = configs)
else:
  print("Mount point {mount_point} has already been mounted.")

在我的例子中,密钥保管库更新为 clientid、tenant/directory id、SP 密钥。

更新服务主体后,访问 /mnt/path 时,我看到以下异常。

...
response '{"error":"invalid_client","error_description":"AADSTS7000215: Invalid client secret is provided.

我唯一能想到的是挂载点是用上面代码中的旧秘密创建的。更新服务主体后是否需要卸载并重新创建挂载点?

所以我终于尝试卸载和安装 ADLS G2 存储,现在我可以访问它了。

我没想到配置会以某种方式持久化。只需更新服务主体机密就足够了。