了解 INNER JOIN 逻辑

Understanding INNER JOIN logic

我有以下 table 汇率架构:

name type kind null? default primary key unique key
COUNTRY VARCHAR(10) COLUMN Y N N
RATETYPE VARCHAR(6) COLUMN Y N N
FROMCURRENCY VARCHAR(3) COLUMN Y N N
TOCURRENCY VARCHAR(3) COLUMN Y N N
STARTDATE VARCHAR(12) COLUMN Y N N
RATE NUMBER(15,7) COLUMN Y N N

其中我只想要 USD/MTHEND 行,即:

SELECT FromCurrency, ToCurrency, Date(StartDate, 'YYYYMMDD') AS StartDate, Rate 
FROM EXCHANGERATES
WHERE DATE(StartDate, 'YYYYMMDD') > CURRENT_DATE - 15000 AND RATETYPE = 'MTHEND' AND ToCurrency = 'USD'
ORDER BY FromCurrency, ToCurrency, StartDate;
FROMCURRENCY TOCURRENCY STARTDATE RATE
JPY USD 2018-12-01 113.4700000
JPY USD 2019-03-30 0.0090342
JPY USD 2019-06-28 0.0092721
JPY USD 2019-08-02 0.0093388
JPY USD 2019-08-30 0.0093967
JPY USD 2019-09-27 0.0092729
JPY USD 2019-11-01 0.0092592
JPY USD 2019-11-29 0.0091315
JPY USD 2019-12-28 0.0091174
JPY USD 2020-02-01 0.0091675
JPY USD 2020-02-29 0.0091802
JPY USD 2020-03-28 0.0092157
JPY USD 2020-05-02 0.0093431
JPY USD 2020-05-30 0.0093266
JPY USD 2020-06-27 0.0093361
JPY USD 2020-08-01 0.0095812
JPY USD 2020-08-29 0.0094144
JPY USD 2020-09-26 0.0094966
JPY USD 2020-10-31 0.0095739
JPY USD 2020-11-27 0.0096061
JPY USD 2020-12-26 0.0096525
JPY USD 2021-01-30 0.0095693
JPY USD 2021-02-27 0.0094197
... ... ... ...
JPY USD 2022-02-26 0.0086700

但是没有结束日期列,因此我有以下查询使用 self INNER JOIN 来设置结束日期:

    SELECT
    EX.FromCurrency,
    EX.ToCurrency,
    DATE(EX.StartDate,'YYYYMMDD') AS StartDate, DATE(EX2.EndDate,'YYYYMMDD') AS EndDate,
    EX.Rate    
FROM
    EXCHANGERATES EX
INNER JOIN(
    SELECT
        FromCurrency,
        ToCurrency,
        Max(StartDate) AS StartDate,
        20251231 AS EndDate
    FROM
        EXCHANGERATES
    WHERE
        RateType = 'MTHEND'
    GROUP BY
        Fromcurrency,
        ToCurrency
UNION
    SELECT
        E2.FromCurrency,
        E2.ToCurrency,
        Max(E.StartDate) AS StartDate,
        to_number(to_char(DateAdd(DAY,-1,To_Date(to_char(E2.StartDate),'YYYYMMDD')),'YYYYMMDD')) AS EndDate
    FROM
        EXCHANGERATES E
    INNER JOIN 
EXCHANGERATES E2 ON
        E.StartDate < E2.StartDate
        AND E.RateType = E2.RateType
    WHERE
        E.RateType = 'MTHEND'
    GROUP BY
        E2.FromCurrency,
        E2.ToCurrency,
        E2.StartDate) AS EX2 ON
    EX.FromCurrency = EX2.FromCurrency
    AND EX.ToCurrency = EX2.ToCurrency
    AND EX.StartDate = EX2.StartDate
    AND EX.RateType = 'MTHEND'
WHERE
    Ex.tocurrency = 'USD'
ORDER BY    1,  2,  3;
FROMCURRENCY TOCURRENCY STARTDATE ENDDATE RATE
JPY USD 2019-12-28 2020-01-31 0.0091174
JPY USD 2020-05-02 2020-05-29 0.0093431
JPY USD 2020-05-30 2020-06-26 0.0093266
JPY USD 2020-06-27 2020-07-31 0.0093361
JPY USD 2020-08-01 2020-08-28 0.0095812
JPY USD 2020-09-26 2020-10-30 0.0094966
JPY USD 2020-10-31 2020-11-26 0.0095739
JPY USD 2020-12-26 2021-01-29 0.0096525
JPY USD 2021-01-30 2021-02-26 0.0095693
JPY USD 2021-02-27 2021-03-26 0.0094197

为什么 INNER 结果与下面使用 LEAD 的 tinazmu 查询不同?下面捕获所有具有正确结束日期的唯一 USD/MTHEND 行:

SELECT
        FromCurrency,
        ToCurrency,
        DATE(StartDate,'YYYYMMDD') AS StartDate,
        LEAD(DateAdd(DAY, -1, Date(StartDate, 'YYYYMMDD')),1,'2025-12-31') 
            OVER (PARTITION BY FromCurrency, ToCurrency, RateType 
                    ORDER BY StartDate) as EndDate, 
        Rate    
FROM
    EXCHANGERATES
WHERE RateType = 'MTHEND' AND ToCurrency = 'USD'
ORDER BY FromCurrency, ToCurrency, StartDate;
FROMCURRENCY TOCURRENCY STARTDATE ENDDATE RATE
JPY USD 2018-12-01 2019-03-29 113.4700000
JPY USD 2019-03-30 2019-06-27 0.0090342
JPY USD 2019-06-28 2019-08-01 0.0092721
JPY USD 2019-08-02 2019-08-29 0.0093388
JPY USD 2019-08-30 2019-09-26 0.0093967
JPY USD 2019-09-27 2019-10-31 0.0092729
JPY USD 2019-11-01 2019-11-28 0.0092592
JPY USD 2019-11-29 2019-12-27 0.0091315
JPY USD 2019-12-28 2020-01-31 0.0091174
JPY USD 2020-02-01 2020-02-28 0.0091675

你没有显示你的 EXCHANGERATES table,但它似乎只有一个日期:StartDate(它应该被称为 EffectiveDate),并且它为每个货币对和日期保留一行可用率。事实上,汇率每天都在变化,public 假期除外,不保留假期汇率(通过复制前一天的汇率)并不能节省多少。然后,只需说 ON ... EXCHANGERATES.StartDate=DayN,就可以 运行 他们对 day-n 的汇率转换查询,而以上所有操作都是不必要的。

如果您对基础 EXCHANGERATE table 的人口制度没有任何控制权,那么您必须找到一种方法来获取 DayN 的汇率,如果该汇率不可用,则为 DayN-1 , 等等。如果您知道周末唯一缺少的费率,您可以简单地加入此 table 3 次,全部使用 LEFT JOIN,第一次使用 StartDate=DayN,第二次使用 StartDate.DayN-1,等等.. ,并选择最新的可用。

另一方面,如果存在不可预测的 table 持续时间间隔,您的问题将变成 gaps/island 问题,您发布的查询是解决它的一种方法。还有其他方法,不一定更好,寻找SQL差距和孤岛问题,巩固islands/packing.

我不知道 Snowflake 平台,但在 SQLServer(或 Teradata)中,这可以替代您的查询:

SELECT
        FromCurrency,
        ToCurrency,
        RateType,
        Rate,
        StartDate,
        LEAD(DateAdd(day, -1, StartDate),1,'2025-12-31') 
            OVER (partition by FromCurrency, ToCurrency, RateType 
                    ORDER BY by StartDate) as EndDate
FROM  EXCHANGERATES E

2022 年 2 月 28 日更新;根据我对您数据的理解,这应该可以替代您的查询:

SELECT
       FromCurrency,
       ToCurrency,
       DATE(StartDate, 'YYYYMMDD') as StartDate,
       LEAD(DateAdd(day, -1, DATE(StartDate, 'YYYYMMDD')),1,'2025-12-31')
            OVER (PARTITION by FromCurrency, ToCurrency, RateType
                  ORDER BY StartDate) as EndDate,
      Rate
FROM  EXCHANGERATES E
WHERE ToCurrency='USD'
  and RateType='MTHEND'
ORDER BY    1,  2,  3;

你能检查一下吗?

2022 年 3 月 1 日更新:

联合子查询 EX2 简单地找到 'Month End Rates' 的所有日期间隔: 并集的第 1 部分(SELECT ... Max(StartDate) AS StartDate, 20251231 AS EndDate)找到最新的 StartDate,每个 From/ToCurrency 和 calls 的组合都有月末汇率这从 StartDate 到 2025-12-31(未来的日期)有效。这样,最新汇率可用于任何日期 >=max(StartDate)

然后按如下方式组合(UNION 的第 2 部分)较旧的记录:对于 table (E2) 中的每个月末汇率,它会在 table (E) 中找到之前的汇率, E.StartDate 上有一个新的汇率

外部查询 (EX) 然后自己获取速率,将它们与 EX2 中导出的间隔相结合。

为了使其正常工作,UNION 第二部分的连接条件必须指定相同的货币(否则我们会找到与先前记录不同的货币的汇率):

        E.StartDate < E2.StartDate
        AND E.RateType = E2.RateType
        AND E.FromCurrency = E2.FromCurrency
        AND E.ToCurrency=E2.ToCurrency

也许这就解释了差异...