从 Google Cloud Composer 生成 dbt 文档

Generate dbt documentation from Google Cloud Composer

我有一个dbt project running on Cloud Composer,我所有的模型和快照都运行成功。

所有处理完成后,我无法生成文档。

dbt 和 cloud composer 之间的集成是通过 airflow-dbt 完成的,我已经为 DbtDocsGenerateOperator 设置了一个任务。 DAG其实运行的很好,我在日志中看到catalog.json文件正在写入对应的云桶中的目标文件夹,但是文件不存在

在对 GCP 日志记录进行一些调查后,我注意到有一个名为 gcs-syncd 的进程正在 显然 删除文件。

想知道是否有人在此集成之前取得过成功并且能够从云作曲家生成 dbt 文档?

{
    insertId: "**********"
    labels: {2}
    logName: "************/logs/gcs-syncd"
    receiveTimestamp: "****-**-****:**:33.621914158*"
    resource: {2}
    severity: "INFO"
    textPayload: "Removing file:///home/airflow/gcs/dags/target/catalog.json"
    timestamp: "****-**-****:**:28.220171689Z"
}

后跟此错误消息:

{
    insertId: "rdvl8sfx903ai0y8"
    labels: {
    compute.googleapis.com/resource_name: "***************"
    k8s-pod/config_id: "************************"
    k8s-pod/pod-template-hash: "*************"
    k8s-pod/run: "airflow-worker"
}
logName: "************/logs/stderr"
receiveTimestamp: "****-**-****:**:28.921706522Z"
resource: {
labels: {6}
type: "k8s_container"
}
severity: "ERROR"
textPayload: "Removing file:///home/airflow/gcs/dags/target/catalog.json"
timestamp: "****-**-****:**:28.220171689Z"
}

airflow 日志根本没有显示任何错误,进程成功并显示消息:

[2021-11-14 21:08:10,601] {dbt_hook.py:130} INFO - 21:08:10 |
[2021-11-14 21:08:10,679] {dbt_hook.py:130} INFO - 21:08:10 | Done.
[2021-11-14 21:08:10,682] {dbt_hook.py:130} INFO - 21:08:10 | Building catalog
[2021-11-14 21:08:43,054] {dbt_hook.py:130} INFO - 21:08:43 | Catalog written to /home/airflow/gcs/dags/target/catalog.json
[2021-11-14 21:08:43,578] {dbt_hook.py:132} INFO - Command exited with return code 0
[2021-11-14 21:08:47,822] {taskinstance.py:1213} INFO - Marking task as SUCCESS.

这里的问题是您正在将目录文件写入挂载到 gcs 中的 dags 文件夹的工作节点上的某个位置,该文​​件夹由 airflow 和 cloud composer 管理。根据 documentation

When you modify DAGs or plugins in the Cloud Storage bucket, Cloud Composer synchronizes the data across all the nodes in the cluster.

Cloud Composer synchronizes the dags/ and plugins/ folders uni-directionally by copying locally. Unidirectional synching means that local changes in these folders are overwritten.

The data/ and logs/ folders synchronize bi-directionally by using Cloud >Storage FUSE.

如果您将此文件的位置更改为 /home/airflow/gcs/data/target/catalog.json,您应该没问题,因为同步双向