使用 Databricks 在 Apache Spark 中安装 Azure Data Lake 时出错

Error Mounting Azure Data Lake in Apache Spark using Databricks

我正在尝试在 Apache Spark

中使用以下 Python 代码安装我们的 Azure Data Lake
def check(mntPoint):
  a= []
  for test in dbutils.fs.mounts():
    a.append(test.mountPoint)
  result = a.count(mntPoint)
  return result

mount = "/mnt/lake"

if check(mount)==1:
  resultMsg = "<div>%s is already mounted. </div>" % mount
else:
  dbutils.fs.mount(
  source = "wasbs://root@adlsprexxxxxdlsdev.blob.core.windows.net",
  mount_point = mount,
  extra_configs = {"fs.azure.account.key.adlspretxxxxdlsdev.blob.core.windows.net":""})
  resultMsg = "<div>%s was mounted. </div>" % mount

displayHTML(resultMsg)

但我不断收到以下错误:

shaded.databricks.org.apache.hadoop.fs.azure.AzureException: java.lang.IllegalArgumentException: Storage Key is not a valid base64 encoded string.

完整错误如下:

ExecutionError                            Traceback (most recent call last)
<command-3313750897057283> in <module>
      4   resultMsg = "<div>%s is already mounted. </div>" % mount
      5 else:
----> 6   dbutils.fs.mount(
      7   source = "wasbs://root@adlsprexxxxxxxkadlsdev.blob.core.windows.net",
      8   mount_point = mount,

/local_disk0/tmp/1619799109257-0/dbutils.py in f_with_exception_handling(*args, **kwargs)
    322                     exc.__context__ = None
    323                     exc.__cause__ = None
--> 324                     raise exc
    325             return f_with_exception_handling
    326 

有人可以告诉我如何解决这个问题吗?

您需要提供存储密钥,而现在您有空字符串。通常人们将存储密钥放入 Azure KeyVault(并将其安装为秘密范围)或使用 Databricks-baked 秘密范围,然后通过 dbutils.secrets.get 访问该存储密钥(如 documentation 所示) :

dbutils.fs.mount(
  source = "wasbs://root@adlsprexxxxxdlsdev.blob.core.windows.net",
  mount_point = mount,
  extra_configs = {"fs.azure.account.key.adlspretxxxxdlsdev.blob.core.windows.net":
      dbuitils.secrets.get(scope_name, secret_name)})