更新服务主体后,Azure Databricks 作业无法访问 ADLS 存储
Azure Databricks job fails to access ADLS storage after renewing service principal
用于连接到 ADLS G2 存储并成功处理文件的 Databricks 作业。
最近在更新服务主体机密并更新 Key-vault 中的机密后,现在作业失败了。
使用 databricks-cli databricks secrets list-scopes --profile mycluster
,我能够识别正在使用的密钥值,还验证了相应的秘密是否已正确更新。
在笔记本中,我关注 link 并且能够访问 ALDS
下面我用来测试密钥保管库值,以访问 ADLS。
scopename="name-of-the-scope-used-in-databricks-workspace"
appId=dbutils.secrets.get(scope=scopename,key="name-of-the-key-from-keyvault-referring-appid")
directoryId=dbutils.secrets.get(scope=scopename,key="name-of-key-from-keyvault-referring-TenantId")
secretValue=dbutils.secrets.get(scope=scopename,key="name-of-key-from-keyvaut-referring-Secretkey")
storageAccount="ADLS-Gen2-StorageAccountName"
spark.conf.set(f"fs.azure.account.auth.type.{storageAccount}.dfs.core.windows.net", "OAuth")
spark.conf.set(f"fs.azure.account.oauth.provider.type.{storageAccount}.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(f"fs.azure.account.oauth2.client.id.{storageAccount}.dfs.core.windows.net", appid)
spark.conf.set(f"fs.azure.account.oauth2.client.secret.{storageAccount}.dfs.core.windows.net", secretValue)
spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{storageAccount}.dfs.core.windows.net", f"https://login.microsoftonline.com/{directoryid}/oauth2/token")
dbutils.fs.ls("abfss://<container-name>@<storage-accnt-name>.dfs.core.windows.net/<folder>")
在附加集群的情况下,上面成功显示了 ADLS G2 存储中 folders/files 的列表。
用于创建挂载点的代码,它使用了旧的机密信息。
scope_name="name-of-the-scope-from-workspace"
directoryId=dbutils.secrets.get(scope=scope_name, key="name-of-key-from-keyvault-which-stores-tenantid-value")
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": dbutils.secrets.get(scope=scope_name, key="name-of-key-from-key-vault-referring-to-clientid"),
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope=scope_name, key="name-of-key-from-key-vault-referring-to-secretvalue-generated-in-sp-secrets"),
"fs.azure.account.oauth2.client.endpoint": f"https://login.microsoftonline.com/{directoryId}/oauth2/token"}
storage_acct_name="storageaccountname"
container_name="name-of-container"
mount_point = "/mnt/appadls/content"
if not any(mount.mountPoint == mount_point for mount in dbutils.fs.mounts()):
print(f"Mounting {mount_point} to DBFS filesystem")
dbutils.fs.mount(
source = f"abfss://{container_name}@{storage_acct_name}.dfs.core.windows.net/",
mount_point = mount_point,
extra_configs = configs)
else:
print("Mount point {mount_point} has already been mounted.")
在我的例子中,密钥保管库更新为 clientid、tenant/directory id、SP 密钥。
更新服务主体后,访问 /mnt/path 时,我看到以下异常。
...
response '{"error":"invalid_client","error_description":"AADSTS7000215: Invalid client secret is provided.
我唯一能想到的是挂载点是用上面代码中的旧秘密创建的。更新服务主体后是否需要卸载并重新创建挂载点?
所以我终于尝试卸载和安装 ADLS G2 存储,现在我可以访问它了。
我没想到配置会以某种方式持久化。只需更新服务主体机密就足够了。
用于连接到 ADLS G2 存储并成功处理文件的 Databricks 作业。
最近在更新服务主体机密并更新 Key-vault 中的机密后,现在作业失败了。
使用 databricks-cli databricks secrets list-scopes --profile mycluster
,我能够识别正在使用的密钥值,还验证了相应的秘密是否已正确更新。
在笔记本中,我关注 link 并且能够访问 ALDS
下面我用来测试密钥保管库值,以访问 ADLS。
scopename="name-of-the-scope-used-in-databricks-workspace"
appId=dbutils.secrets.get(scope=scopename,key="name-of-the-key-from-keyvault-referring-appid")
directoryId=dbutils.secrets.get(scope=scopename,key="name-of-key-from-keyvault-referring-TenantId")
secretValue=dbutils.secrets.get(scope=scopename,key="name-of-key-from-keyvaut-referring-Secretkey")
storageAccount="ADLS-Gen2-StorageAccountName"
spark.conf.set(f"fs.azure.account.auth.type.{storageAccount}.dfs.core.windows.net", "OAuth")
spark.conf.set(f"fs.azure.account.oauth.provider.type.{storageAccount}.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(f"fs.azure.account.oauth2.client.id.{storageAccount}.dfs.core.windows.net", appid)
spark.conf.set(f"fs.azure.account.oauth2.client.secret.{storageAccount}.dfs.core.windows.net", secretValue)
spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{storageAccount}.dfs.core.windows.net", f"https://login.microsoftonline.com/{directoryid}/oauth2/token")
dbutils.fs.ls("abfss://<container-name>@<storage-accnt-name>.dfs.core.windows.net/<folder>")
在附加集群的情况下,上面成功显示了 ADLS G2 存储中 folders/files 的列表。
用于创建挂载点的代码,它使用了旧的机密信息。
scope_name="name-of-the-scope-from-workspace"
directoryId=dbutils.secrets.get(scope=scope_name, key="name-of-key-from-keyvault-which-stores-tenantid-value")
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": dbutils.secrets.get(scope=scope_name, key="name-of-key-from-key-vault-referring-to-clientid"),
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope=scope_name, key="name-of-key-from-key-vault-referring-to-secretvalue-generated-in-sp-secrets"),
"fs.azure.account.oauth2.client.endpoint": f"https://login.microsoftonline.com/{directoryId}/oauth2/token"}
storage_acct_name="storageaccountname"
container_name="name-of-container"
mount_point = "/mnt/appadls/content"
if not any(mount.mountPoint == mount_point for mount in dbutils.fs.mounts()):
print(f"Mounting {mount_point} to DBFS filesystem")
dbutils.fs.mount(
source = f"abfss://{container_name}@{storage_acct_name}.dfs.core.windows.net/",
mount_point = mount_point,
extra_configs = configs)
else:
print("Mount point {mount_point} has already been mounted.")
在我的例子中,密钥保管库更新为 clientid、tenant/directory id、SP 密钥。
更新服务主体后,访问 /mnt/path 时,我看到以下异常。
...
response '{"error":"invalid_client","error_description":"AADSTS7000215: Invalid client secret is provided.
我唯一能想到的是挂载点是用上面代码中的旧秘密创建的。更新服务主体后是否需要卸载并重新创建挂载点?
所以我终于尝试卸载和安装 ADLS G2 存储,现在我可以访问它了。
我没想到配置会以某种方式持久化。只需更新服务主体机密就足够了。