Error Mounting ADLS on DBFS for Databricks (Error: NullPointerException)
Error Mounting ADLS on DBFS for Databricks (Error: NullPointerException)
我正在尝试在 Databricks 中安装 Azure Data Lake Gen 2,出现如下错误。
java.lang.NullPointerException: authEndpoint
我使用的代码如下所示
configs = {
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.auth.provider.type": "org.apache.hadoop.fs.azurebfs.ClientCredsTokenProvider",
"fs.azure.account.auth2.client.id": "<client-id>",
"fs.azure.account.auth2.client.secret": dbutils.secrets.get(scope = "scope1", key = "kvsecretfordbricks"),
"dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"}
dbutils.fs.mount(
source = "abfss://starter1@newresourcegroupadcadls.dfs.core.windows.net/",
mount_point = "/mnt/demo",
extra_configs = configs)
完整错误如下
--------------------------------------------------------------------------- ExecutionError Traceback (most recent call
last) in
9 source = "abfss://starter1@newresourcegroupadcadls.dfs.core.windows.net/",
10 mount_point = "/mnt/demo",
---> 11 extra_configs = configs)
/local_disk0/tmp/1612619970782-0/dbutils.py in
f_with_exception_handling(*args, **kwargs)
312 exc.context = None
313 exc.cause = None
--> 314 raise exc
315 return f_with_exception_handling
316
ExecutionError: An error occurred while calling o271.mount. :
java.lang.NullPointerException: authEndpoint at
shaded.databricks.v20180920_b33d810.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
at
shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenUsingClientCreds(AzureADAuthenticator.java:84)
at
com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureOAuth(DBUtilsCore.scala:477)
at
com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureFileSystem(DBUtilsCore.scala:488)
at
com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(DBUtilsCore.scala:446)
at sun.reflect.GeneratedMethodAccessor292.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at
py4j.Gateway.invoke(Gateway.java:295) at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79) at
py4j.GatewayConnection.run(GatewayConnection.java:251) at
java.lang.Thread.run(Thread.java:748)
如有任何帮助,我们将不胜感激
当我运行
dbutils.fs.unmount("/mnt")
没有以“/mnt”开头的挂载点
--
更新
将 dfs.adls.oauth2.refresh.url
更新为 fs.azure.account.oauth2.client.endpoint
后的其他错误消息
ExecutionError Traceback (most recent call
last) in
9 source = "abfss://starter1@newresourcegroupadcadls.dfs.core.windows.net/",
10 mount_point = "/mnt/demo",
---> 11 extra_configs = configs)
/local_disk0/tmp/1612858508533-0/dbutils.py in
f_with_exception_handling(*args, **kwargs)
312 exc.context = None
313 exc.cause = None
--> 314 raise exc
315 return f_with_exception_handling
316
ExecutionError: An error occurred while calling o275.mount. :
java.lang.NullPointerException: clientId at
shaded.databricks.v20180920_b33d810.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
at
shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenUsingClientCreds(AzureADAuthenticator.java:85)
at
com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureOAuth(DBUtilsCore.scala:477)
at
com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureFileSystem(DBUtilsCore.scala:488)
at
com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(DBUtilsCore.scala:446)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at
py4j.Gateway.invoke(Gateway.java:295) at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79) at
py4j.GatewayConnection.run(GatewayConnection.java:251) at
java.lang.Thread.run(Thread.java:748)
如果要将 Azure Data Lake Storage Gen2 帐户装载到 DBFS,请将 dfs.adls.oauth2.refresh.url
更新为 fs.azure.account.oauth2.client.endpoint
。详情请参考official document and here
例如
- 创建 Azure Data Lake Storage Gen2 帐户。
az login
az storage account create \
--name <account-name> \
--resource-group <group name> \
--location westus \
--sku Standard_RAGRS \
--kind StorageV2 \
--enable-hierarchical-namespace true
- 创建服务主体并将 Storage Blob Data Contributor 分配给 Data Lake Storage Gen2 存储帐户范围内的 sp
az login
az ad sp create-for-rbac -n "MyApp" --role "Storage Blob Data Contributor" \
--scopes /subscriptions/<subscription>/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/<storage-account>
在 Azure Databricks 中创建一个 Spark 集群
在 Azure 数据块中装载 Azure 数据湖 gen2(python)
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<application-id>",
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}
# Optionally, you can add <directory-name> to the source URI of your mount point.
dbutils.fs.mount(
source = "abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/",
mount_point = "/mnt/demo",
extra_configs = configs)
- 检查
dbutils.fs.ls("/mnt/demo")
我正在尝试在 Databricks 中安装 Azure Data Lake Gen 2,出现如下错误。
java.lang.NullPointerException: authEndpoint
我使用的代码如下所示
configs = {
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.auth.provider.type": "org.apache.hadoop.fs.azurebfs.ClientCredsTokenProvider",
"fs.azure.account.auth2.client.id": "<client-id>",
"fs.azure.account.auth2.client.secret": dbutils.secrets.get(scope = "scope1", key = "kvsecretfordbricks"),
"dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"}
dbutils.fs.mount(
source = "abfss://starter1@newresourcegroupadcadls.dfs.core.windows.net/",
mount_point = "/mnt/demo",
extra_configs = configs)
完整错误如下
--------------------------------------------------------------------------- ExecutionError Traceback (most recent call last) in 9 source = "abfss://starter1@newresourcegroupadcadls.dfs.core.windows.net/", 10 mount_point = "/mnt/demo", ---> 11 extra_configs = configs)
/local_disk0/tmp/1612619970782-0/dbutils.py in f_with_exception_handling(*args, **kwargs) 312 exc.context = None 313 exc.cause = None --> 314 raise exc 315 return f_with_exception_handling 316
ExecutionError: An error occurred while calling o271.mount. : java.lang.NullPointerException: authEndpoint at shaded.databricks.v20180920_b33d810.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204) at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenUsingClientCreds(AzureADAuthenticator.java:84) at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureOAuth(DBUtilsCore.scala:477) at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureFileSystem(DBUtilsCore.scala:488) at com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(DBUtilsCore.scala:446) at sun.reflect.GeneratedMethodAccessor292.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748)
如有任何帮助,我们将不胜感激
当我运行
dbutils.fs.unmount("/mnt")
没有以“/mnt”开头的挂载点
--
更新
将 dfs.adls.oauth2.refresh.url
更新为 fs.azure.account.oauth2.client.endpoint
ExecutionError Traceback (most recent call last) in 9 source = "abfss://starter1@newresourcegroupadcadls.dfs.core.windows.net/", 10 mount_point = "/mnt/demo", ---> 11 extra_configs = configs)
/local_disk0/tmp/1612858508533-0/dbutils.py in f_with_exception_handling(*args, **kwargs) 312 exc.context = None 313 exc.cause = None --> 314 raise exc 315 return f_with_exception_handling 316
ExecutionError: An error occurred while calling o275.mount. : java.lang.NullPointerException: clientId at shaded.databricks.v20180920_b33d810.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204) at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenUsingClientCreds(AzureADAuthenticator.java:85) at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureOAuth(DBUtilsCore.scala:477) at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureFileSystem(DBUtilsCore.scala:488) at com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(DBUtilsCore.scala:446) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748)
如果要将 Azure Data Lake Storage Gen2 帐户装载到 DBFS,请将 dfs.adls.oauth2.refresh.url
更新为 fs.azure.account.oauth2.client.endpoint
。详情请参考official document and here
例如
- 创建 Azure Data Lake Storage Gen2 帐户。
az login
az storage account create \
--name <account-name> \
--resource-group <group name> \
--location westus \
--sku Standard_RAGRS \
--kind StorageV2 \
--enable-hierarchical-namespace true
- 创建服务主体并将 Storage Blob Data Contributor 分配给 Data Lake Storage Gen2 存储帐户范围内的 sp
az login
az ad sp create-for-rbac -n "MyApp" --role "Storage Blob Data Contributor" \
--scopes /subscriptions/<subscription>/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/<storage-account>
在 Azure Databricks 中创建一个 Spark 集群
在 Azure 数据块中装载 Azure 数据湖 gen2(python)
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<application-id>",
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}
# Optionally, you can add <directory-name> to the source URI of your mount point.
dbutils.fs.mount(
source = "abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/",
mount_point = "/mnt/demo",
extra_configs = configs)
- 检查
dbutils.fs.ls("/mnt/demo")