自动将新列添加到增量（或其他类型）

Automatically add new column to incremental (or other type)

对于我正在尝试解决的新 DBT 用例，我需要一些智慧。我对 DBT 还很陌生，不确定什么是最有效的 DBT 方式。我们正在使用雪花作为我们的 DWH。

问题

我们有很多使用 DBT 管理的增量模型。最近，我们需要为所有模型添加一个新列。最有效的 DBT 方法是什么？我们应该覆盖增量宏脚本吗？ (I found this for snowflake.)我假设最后的手段是手动将新列添加到每个模型。

您可以 --full-refresh 所有增量模型或在 dbt 之外执行此架构迁移。

如果可以的话，我建议使用 --full-refresh。由于 --full-refresh 重建了 table，它负责架构更改和新列的历史值。

在当前版本的 dbt v0.21.0 上，引入了一个新的增量设置 on_schema_change。您可以将其设置为 append_new_columns

引用文档中的一些相关部分：

New on_schema_change config in dbt version v0.21.0 Incremental models can now be configured to include an optional on_schema_change parameter to enable additional control when incremental model columns change. These options enable dbt to continue running incremental models in the presence of schema changes, resulting in fewer --full-refresh scenarios and saving query costs.

append_new_columns: Append new columns to the existing table. Note that this setting does not remove columns from the existing table that are not present in the new data.

Note: None of the on_schema_change behaviors backfill values in old records for newly added columns. If you need to populate those values, we recommend running manual updates, or triggering a --full-refresh.

如果 --full-refresh 不是一个选项或者您使用的是较旧的 dbt 版本，那么您的模式迁移将必须手动完成。

步骤是：

通过更改添加新列 table alter table my_incremental_table add column new_column_name data_type
运行更新查询以混合新列
编辑 my_incremental_table 的 dbt 模型，将 new_column_name 添加到 select 查询列列表的末尾

这会起作用，因为 dbt 是无状态的，但由于这是手动操作，如果可以避免，我不推荐这样做。

另请注意，如果您使用 on_schema_change 方法，您仍然需要手动回填新列。

自动将新列添加到增量（或其他类型）

Automatically add new column to incremental (or other type)

snowflake-cloud-data-platform

dbt

问题