如何在 azure blob 上保存由 spark-df-profiling 生成的 html 报告？

Question

我正在使用 spark-df-profiling 包在 azure databricks 中生成分析报告。但是 ProfileReport 中的 to_file 函数会生成一个 html 文件，我无法在 azure blob 上写入该文件。

已经尝试过：

带有容器和存储帐户名称的 wasb 路径
创建了空 html 文件，上传到 blob 并使用 url 写入
为上面创建的空文件生成了 sas 令牌，并给出 url

profile = spark_df_profiling.ProfileReport(df)
profile.to_file(paths in already tried)

我想将输出保存在提供的路径上

Answer 1

我查看了julioasotodv/spark-df-profiling版本v1.1.13的源代码后，通过下面的代码解决了。首先请参考Azure Databricks官方文档Data Sources > Azure Blob Storage and Databricks File System for dbutils了解如何将数据写入指定数据源，如Azure Storage。

这是我的示例代码，它适用于 Azure Databricks 和 Azure 存储。

storage_account_name='<your storage account name>'
storage_account_access_key='<your storage account key>'
spark.conf.set(
  "fs.azure.account.key."+storage_account_name+".blob.core.windows.net",
  storage_account_access_key)

# My sample pandas dataframe for testing
import pandas as pd
d = {'col1': [1, 2], 'col2': [3, 4]}
pd_df = pd.DataFrame(data=d)

import spark_df_profiling
from spark_df_profiling.templates import template
df = spark.createDataFrame(pd_df)
profile = spark_df_profiling.ProfileReport(df)
dbutils.fs.put("wasbs://<your container name>@ppas.blob.core.windows.net/test.html", template('wrapper').render(content=profile.html))

我可以通过结果 True 看到它的工作原理并将 29806 字节输出到 Azure Blob，然后在 Azure 存储资源管理器中检查它。

希望对您有所帮助。

如何在 azure blob 上保存由 spark-df-profiling 生成的 html 报告？

How to save html report generated by spark-df-profiling on azure blob?

python

profiling

pyspark

azure-blob-storage