无法从 Databricks 社区版安装 Azure ADLS Gen 2:com.databricks.rpc.UnknownRemoteException:发生远程异常
Unable to mount Azure ADLS Gen 2 on from Community Edition of Databricks : com.databricks.rpc.UnknownRemoteException: Remote exception occurred
我正在尝试从我的数据块社区版安装 ADLS Gen 2,但是当我 运行 以下代码时:
test = spark.read.csv("/mnt/lake/RAW/csds.csv", inferSchema=True, header=True)
我收到错误:
com.databricks.rpc.UnknownRemoteException: Remote exception occurred:
我正在使用以下代码安装 ADLS Gen 2
def check(mntPoint):
a= []
for test in dbutils.fs.mounts():
a.append(test.mountPoint)
result = a.count(mntPoint)
return result
mount = "/mnt/lake"
if check(mount)==1:
resultMsg = "<div>%s is already mounted. </div>" % mount
else:
dbutils.fs.mount(
source = "wasbs://root@adlspretbiukadlsdev.blob.core.windows.net",
mount_point = mount,
extra_configs = {"fs.azure.account.key.adlspretbiukadlsdev.blob.core.windows.net":""})
resultMsg = "<div>%s was mounted. </div>" % mount
displayHTML(resultMsg)
ServicePrincipalID = 'xxxxxxxxxxx'
ServicePrincipalKey = 'xxxxxxxxxxxxxx'
DirectoryID = 'xxxxxxxxxxxxxxx'
Lake = 'adlsgen2'
# Combine DirectoryID into full string
Directory = "https://login.microsoftonline.com/{}/oauth2/token".format(DirectoryID)
# Create configurations for our connection
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": ServicePrincipalID,
"fs.azure.account.oauth2.client.secret": ServicePrincipalKey,
"fs.azure.account.oauth2.client.endpoint": Directory}
mount = "/mnt/lake"
if check(mount)==1:
resultMsg = "<div>%s is already mounted. </div>" % mount
else:
dbutils.fs.mount(
source = f"abfss://root@{Lake}.dfs.core.windows.net/",
mount_point = mount,
extra_configs = configs)
resultMsg = "<div>%s was mounted. </div>" % mount
然后我尝试使用以下方法读取 ADLS Gen 2 中的数据帧:
dataPath = "/mnt/lake/RAW/DummyEventData/CommerceTools/"
test = spark.read.csv("/mnt/lake/RAW/csds.csv", inferSchema=True, header=True)
com.databricks.rpc.UnknownRemoteException: Remote exception occurred:
有什么想法吗?
根据堆栈跟踪,该错误的最可能原因是您没有 Storage Blob Data Contributor(或 Storage Blob Data Reader) 为您的服务主体分配的角色(如 documentation 中所述)。这个角色与通常的“贡献者”角色不同,这很令人困惑。
我正在尝试从我的数据块社区版安装 ADLS Gen 2,但是当我 运行 以下代码时:
test = spark.read.csv("/mnt/lake/RAW/csds.csv", inferSchema=True, header=True)
我收到错误:
com.databricks.rpc.UnknownRemoteException: Remote exception occurred:
我正在使用以下代码安装 ADLS Gen 2
def check(mntPoint):
a= []
for test in dbutils.fs.mounts():
a.append(test.mountPoint)
result = a.count(mntPoint)
return result
mount = "/mnt/lake"
if check(mount)==1:
resultMsg = "<div>%s is already mounted. </div>" % mount
else:
dbutils.fs.mount(
source = "wasbs://root@adlspretbiukadlsdev.blob.core.windows.net",
mount_point = mount,
extra_configs = {"fs.azure.account.key.adlspretbiukadlsdev.blob.core.windows.net":""})
resultMsg = "<div>%s was mounted. </div>" % mount
displayHTML(resultMsg)
ServicePrincipalID = 'xxxxxxxxxxx'
ServicePrincipalKey = 'xxxxxxxxxxxxxx'
DirectoryID = 'xxxxxxxxxxxxxxx'
Lake = 'adlsgen2'
# Combine DirectoryID into full string
Directory = "https://login.microsoftonline.com/{}/oauth2/token".format(DirectoryID)
# Create configurations for our connection
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": ServicePrincipalID,
"fs.azure.account.oauth2.client.secret": ServicePrincipalKey,
"fs.azure.account.oauth2.client.endpoint": Directory}
mount = "/mnt/lake"
if check(mount)==1:
resultMsg = "<div>%s is already mounted. </div>" % mount
else:
dbutils.fs.mount(
source = f"abfss://root@{Lake}.dfs.core.windows.net/",
mount_point = mount,
extra_configs = configs)
resultMsg = "<div>%s was mounted. </div>" % mount
然后我尝试使用以下方法读取 ADLS Gen 2 中的数据帧:
dataPath = "/mnt/lake/RAW/DummyEventData/CommerceTools/"
test = spark.read.csv("/mnt/lake/RAW/csds.csv", inferSchema=True, header=True)
com.databricks.rpc.UnknownRemoteException: Remote exception occurred:
有什么想法吗?
根据堆栈跟踪,该错误的最可能原因是您没有 Storage Blob Data Contributor(或 Storage Blob Data Reader) 为您的服务主体分配的角色(如 documentation 中所述)。这个角色与通常的“贡献者”角色不同,这很令人困惑。