Azure databricks - 无法使用 datalake 存储 gen2 服务中的 spark 作业读取 .csv 文件

Question

我有一个数据块集群运行很好。使用以下代码，我也可以安装我的“datalake storage gen2”帐户。我正在 /mnt/data1

上安装所有内容

val configs =  Map("fs.azure.account.auth.type" -> "OAuth",
           "fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
           "fs.azure.account.oauth2.client.id" -> appID,
           "fs.azure.account.oauth2.client.secret" -> password,
           "fs.azure.account.oauth2.client.endpoint" -> ("https://login.microsoftonline.com/" + tenantID + "/oauth2/token"),
           "fs.azure.createRemoteFileSystemDuringInitialization"-> "true")
    
    dbutils.fs.mount(
    source = "abfss://" + fileSystemName + "@" + storageAccountName + ".dfs.core.windows.net/",
    mountPoint = "/mnt/data1",
    extraConfigs = configs)

到目前为止，一切都很好并且可以正常工作。但是当我尝试使用以下命令从安装位置访问一个文件时

val df = spark.read.csv("/mnt/data1/creodemocontainer/movies.csv")

我遇到以下错误

java.io.FileNotFoundException: dbfs:/mnt/data1/creodemocontainer2/movies.csv
    at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.$anonfun$getFileStatus(DatabricksFileSystemV2.scala:775)

尽管我可以在 PowerBI 中毫无问题地连接和加载这些文件。我在过去 2 天没有得到任何线索，因此非常感谢任何帮助。

提前致谢。

Answer 1

根据楼主的评论分享答案：

I'm not supposed to add container name while reading.

val df = spark.read.csv("/mnt/data1/creodemocontainer/movies.csv")

删除了容器名称，因为它已经调用了挂载点。现在一切正常

val df = spark.read.csv("/mnt/data1/movies.csv")

Azure databricks - 无法使用 datalake 存储 gen2 服务中的 spark 作业读取 .csv 文件

Azure databricks - not able to read .csv files using spark jobs from datalake storage gen2 service

azure

apache-spark

azure-databricks

azure-data-lake-gen2