为什么在 dbt 中 运行 模型时出现 "column <x> of relation <y> does not exist" 错误,而在 SQL 客户端中 运行 时却没有?

Why am I getting a "column <x> of relation <y> does not exist" error when running a model in dbt but not when running in a SQL client?

问题

我们有一个增量模型,已经 运行 在我们的夜间生产工作中使用了几个月(SQL 下面)。上周,在将我们的生产环境升级到 v0.21.0(从 v0.19.0)之后,模型开始抛出这个错误:

Database Error in model my_model (path/to/my_model.sql)
  column "alias6" of relation "my_model" does not exist
  compiled SQL at target/run/path/to/my_model.sql
{{-
  config(
    materialized = 'incremental',
    dist = 'alias3',
    sort = 'dates_pk',
    unique_key = '__surrogate_key',
    )
-}}

with calculate_metrics as (

    select        
        field1 as dates_pk,
        field2 as alias2,
        {{ my_macro('field3') }} as alias3,
        field4 as alias4,
        field5,
        field6 as alias6,
        field7 as alias7,
        field8 as alias8,
        field9 as alias9,
        (field8::float / field6)::decimal(18, 6) as alias10,
        (field9::float / field7)::decimal(18, 6) as alias11,
        {{ dbt_utils.surrogate_key([
            'field1', 'field2', 'alias4', 'field5']) }} as __surrogate_key
    from {{ ref('upstream_model') }}
    {% if is_incremental() -%}
    where dates_pk >= coalesce((select max(dates_pk) from {{ this }}), '2000-01-01')
    {%- endif -%}

)

select * from calculate_metrics

到目前为止我尝试了什么

{{-
  config(
    materialized = 'table',
    )
-}}

with calculate_metrics as (

    select        
        field1,
        field2,
        {{ my_macro('field3') }} as alias3,
        field4,
        field5,
        field6,
        field7,
        field8,
        field9
    from {{ ref('upstream_model') }}

)

select * from calculate_metrics

Database Error in model my_model (path/to/my_model.sql)
  column "dates_pk" of relation "my_model" does not exist
  compiled SQL at target/run/path/to/my_model.sql

所以 dbt 似乎是 运行 正在寻找模型中 以前 但现在不是的别名的一些数据库操作。不知道为什么会这样,table 物化。

感谢 dbt 支持团队的一些调查,我们发现了这个问题的原因。

该模型的 YML 文档文件包含一个在模型中找不到的列名称(请参阅下面的示例 YML),并且 dbt's persist_docs feature 最近已为该模型启用。

models:
  - name: my_model
    columns:
      ...
      # This should have been named alias6, but was not updated 
      # when the model changed at some point in the past; this
      # didn't cause an error until persist_docs attempted to
      # `comment` on the (non-existent) field in Redshift
      - name: field6
        description: Foo bar baz.

更具体的错误消息会帮助更快地找到问题的根源,所以我在 dbt-core Github.

上记录了 this issue