是否可以将动态值传递到 dbt 源新鲜度测试中？

Question

我正在尝试根据基础源的“synced_at”列的中值和标准差动态确定在 dbt sources.yml 中指定的新鲜度检查的警告和错误。

为了完成这个，我想我可能会尝试在 source.yml 文件的新鲜度块中传递一个宏，如下所示：

# sources.yml
...
    tables:
      - name: appointment_type
        freshness:
          error_after:
            count: test_macro()
            period: hour
...

其中：

{%- macro test_macro(this) -%}

{# /*
The idea is {{ this.table }} would parameterize a query, 
going over the same column name for all sources, _fivetran_synced, 
and spit out the calculated values I want. This makes me feel like 
it needs to be a prehook, that somehow stores the value in a var, 
and that is accessed in the source.yml, instead of calling it directly. 

In this case a trivial integer is attempted to be returned, just as an example.
*/ #}
{{ return(24) }}

{%- endmacro -%}

但是这会导致类型错误。大概根本没有调用宏。用 jinja 引号将它包装起来也是 returns 一个错误。

我很好奇目前是否可以通过任何方式实现将动态值传递给新鲜度检查？

Answer 1

今天无法从 .yml 文件中调用宏，正是出于这个原因：dbt 需要能够静态解析这些文件并验证内部对象（包括资源属性，如源 freshness) 在它之前运行s 对数据库的任何查询。

我认为你可以也许通过将collect_freshness宏覆盖为return来解决这个问题，而不是简单地max(synced_at)，一个时间戳Z-score 与 current_timestamp 不同，基于所有 Fivetran max(synced_at) 时间戳进行了标准化。感觉很棘手，但有可能。

与此同时，我会在这里轻轻地推回你更大的目标。我们认为源新鲜度应该是 规定的 。您可以告诉 Fivetran 您希望它同步数据的频率，并添加 freshness 块来测试这些期望。您可以运行类似您上面设想的临时查询来确定这些期望是否合理。显然，有些表格的更新频率不高或无法预测，但我发现覆盖或删除这些表格的新鲜度期望比增加其帐户的复杂性更有用。

是否可以将动态值传递到 dbt 源新鲜度测试中？

Is it possible to pass dynamic values into a dbt source freshness test?

python

sql

jinja2

dbt