DataFrame.show() 在 Databricks 中抛出错误
DataFrame.show() throwing error in Databricks
我正在尝试使用 Azure Databricks 从 Azure 数据仓库获取数据。
连接部分很好,因为我可以看到 DataFrame 中返回的行,但是当我尝试在 DataFrame 中保存或显示记录时,它会抛出错误。这是我尝试过的:
df = spark.read \
.format("com.databricks.spark.sqldw") \
.option("url", sqlDwNew) \
.option("tempDir", temDir_location) \
.option("forwardSparkAzureStorageCredentials", "true") \
.option("query", "select * from AccessPermission") \
.load()
df.count()
输出
(1) Spark Jobs
df:pyspark.sql.dataframe.DataFrame
AccessPermissionId:integer
AccessPermission:string
Out[16]: 4
错误
df.show()
输出
com.databricks.spark.sqldw.SqlDWSideException: SQL DW failed to execute the JDBC query produced by the connector.
要了解确切原因,我会要求您检查完整的堆栈跟踪并尝试找出问题的根本原因。
根据我的重现,我遇到了完全相同的错误消息,并且能够通过查看堆栈跟踪并发现存储帐户配置问题来解决问题。
com.databricks.spark.sqldw.SqlDWSideException: SQL DW failed to execute the JDBC query produced by the connector.
.
.
.
.
.
Caused by: java.lang.IllegalArgumentException: requirement failed: No access key found in the session conf or the global Hadoop conf for Azure Storage account name: chepra
第 1 步: 在笔记本会话配置中设置 Blob 存储帐户访问密钥。
spark.conf.set(
"fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
"<your-storage-account-access-key>")
步骤 2: 从 Azure Synapse 查询加载数据。
df = spark.read \
.format("com.databricks.spark.sqldw") \
.option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>") \
.option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \
.option("forwardSparkAzureStorageCredentials", "true") \
.option("query", "select * from table") \
.load()
步骤 3: 显示或显示数据帧
df.show()
display(df)
我正在尝试使用 Azure Databricks 从 Azure 数据仓库获取数据。
连接部分很好,因为我可以看到 DataFrame 中返回的行,但是当我尝试在 DataFrame 中保存或显示记录时,它会抛出错误。这是我尝试过的:
df = spark.read \
.format("com.databricks.spark.sqldw") \
.option("url", sqlDwNew) \
.option("tempDir", temDir_location) \
.option("forwardSparkAzureStorageCredentials", "true") \
.option("query", "select * from AccessPermission") \
.load()
df.count()
输出
(1) Spark Jobs
df:pyspark.sql.dataframe.DataFrame
AccessPermissionId:integer
AccessPermission:string
Out[16]: 4
错误
df.show()
输出
com.databricks.spark.sqldw.SqlDWSideException: SQL DW failed to execute the JDBC query produced by the connector.
要了解确切原因,我会要求您检查完整的堆栈跟踪并尝试找出问题的根本原因。
根据我的重现,我遇到了完全相同的错误消息,并且能够通过查看堆栈跟踪并发现存储帐户配置问题来解决问题。
com.databricks.spark.sqldw.SqlDWSideException: SQL DW failed to execute the JDBC query produced by the connector.
.
.
.
.
.
Caused by: java.lang.IllegalArgumentException: requirement failed: No access key found in the session conf or the global Hadoop conf for Azure Storage account name: chepra
第 1 步: 在笔记本会话配置中设置 Blob 存储帐户访问密钥。
spark.conf.set(
"fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
"<your-storage-account-access-key>")
步骤 2: 从 Azure Synapse 查询加载数据。
df = spark.read \
.format("com.databricks.spark.sqldw") \
.option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>") \
.option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \
.option("forwardSparkAzureStorageCredentials", "true") \
.option("query", "select * from table") \
.load()
步骤 3: 显示或显示数据帧
df.show()
display(df)