DBT连接多个表

DBT join multiple tables

我正在学习 DBT,特别是 dbt-mysql。我无法将多个 table 合并为一个 table。 我想做的事: 按 table 的 last_updated(时间戳)日期按几列分组,然后通过拆分 last_updated 字段将这些列组合成单个 table。这是我希望我的数据结束的方式:

这是我的暂存模型(我认为应该直接从数据库中选择):

staging/clients/stg_clients_fields.sql

SELECT id, created, last_updated, service, order_count, spent_count, deleted, country
FROM client_database.clients

然后我有了中间模型(我认为应该根据我的需要重建数据): intermediate/clients/clients_last_updated_grouped.sql

SELECT YEAR(last_updated) as year_updated, MONTH(last_updated) as month_updated, COUNT(id) as client_count
FROM {{ ref('stg_clients_fields') }}
GROUP BY YEAR(last_updated), MONTH (last_updated)

intermediate/clients/clients_deleted_grouped.sql

SELECT YEAR(last_updated) as year_updated, MONTH(last_updated) as month_updated, COUNT(id) as deleted
FROM {{ ref('stg_clients_fields') }}
WHERE deleted = 1
GROUP BY YEAR(last_updated), MONTH (last_updated)

intermediate/clients/clients_service_grouped.sql

SELECT YEAR(last_updated) as year_updated, MONTH(last_updated) as month_updated, COUNT(id) as service
FROM {{ ref('stg_clients_fields') }}
WHERE service IS NOT NULL
GROUP BY YEAR(last_updated), MONTH (last_updated)

其他列基于它们的 WHERE 子句遵循相同的模式。

现在我需要创建一个集市模型,它将使用所有以前创建的数据并将其放在一个单一的 table。

在这一点上,我最终得到几个 table,其中 last_updated 字段被分隔开,具体的列值在日期旁边。

我现在如何将所有这些 table 合并到 last_updated split into columns 字段中?

或者也许有更好的解决方案来按年和月对数据进行分组并根据条件获取单独的列值?

我是 DBT 的新手,所以欢迎所有的帮助和建议!

因为 clients_last_updated_grouped 没有 where 条件,它保证具有其他模型中的所有 year/month 组合。这使它变得容易得多。您可以仅 select 来自该模型并在年月加入其他模型:

with
    updated as (select * from {{ ref('clients_last_updated_grouped') }} ),
    deleted as (select * from  ),
    service as (select * from  ),

    joined as (
        select
            updated.year,
            updated.month,
            updated.client_count,
            coalesce(deleted.deleted, 0) as deleted_count,
            coalesce(service.service, 0) as service_count

        from
            updated
            left join deleted on updated.year = deleted.year and updated.month = deleted.month
            left join service on updated.year = service.year and updated.month = service.month
    )
select *
from joined

如果您的数据库不支持 CTE (with ...),这将变为:

select
    updated.year,
    updated.month,
    updated.client_count,
    coalesce(deleted.deleted, 0) as deleted_count,
    coalesce(service.service, 0) as service_count

from
    {{ ref('clients_last_updated_grouped') }} as updated
    left join {{ ref('clients_deleted_grouped') }} as deleted on updated.year = deleted.year and updated.month = deleted.month
    left join {{ ref('clients_service_grouped') }} as service on updated.year = service.year and updated.month = service.month

如果不是 clients_last_updated_grouped 具有其他表的每个 month/year 组合,则需要先构造一个“date spine”,然后左连接所有 3该日期书脊的表格。