SQL 为什么 dateA - date <= '3 years' 给出的结果与 date <= date + '3 years' 不同

SQL why does dateA - dateB <= '3 years' give a different result than dateA <= dateB + '3 years'

我正在做一个 MODE.com SQL 关于日期格式的练习题。

练习题是:写一个查询,分别统计成立后3年、5年、10年内收购的公司数量(分3列)。还包括一列收购的公司总数。按类别分组并限制为仅包含创建日期的行。

它使用两个 tables:

tutorial.crunchbase_companies_clean_datetable,包括所有公司的信息,如公司名称、成立年份等

tutorial.crunchbase_acquisitions_clean_datetable,包括所有被收购公司的信息,如被收购公司名称、收购日期等

我的代码是:

SELECT companies.category_code,
       COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '3 years' THEN 1 ELSE NULL END) AS less_than_3_years,
       COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '5 years' THEN 1 ELSE NULL END) AS between_3_to_5_years,
       COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '10 years' THEN 1 ELSE NULL END) AS within_10_years,
       COUNT(1) AS total
FROM tutorial.crunchbase_companies_clean_date companies
JOIN tutorial.crunchbase_acquisitions_clean_date acq 
ON companies.permalink = acq.company_permalink
WHERE companies.founded_at_clean IS NOT NULL
GROUP BY 1
ORDER BY total DESC

结果是: My result

答案查询是:

SELECT companies.category_code,
       COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '3 years'
                       THEN 1 ELSE NULL END) AS acquired_3_yrs,
       COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '5 years'
                       THEN 1 ELSE NULL END) AS acquired_5_yrs,
       COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '10 years'
                       THEN 1 ELSE NULL END) AS acquired_10_yrs,
       COUNT(1) AS total
  FROM tutorial.crunchbase_companies_clean_date companies
  JOIN tutorial.crunchbase_acquisitions_clean_date acquisitions
    ON acquisitions.company_permalink = companies.permalink
 WHERE founded_at_clean IS NOT NULL
 GROUP BY 1
 ORDER BY 5 DESC

结果是: The answer result

您可以在屏幕截图中看到结果非常相似,但有些数字不同。

我的查询和答案之间的唯一区别是在 COUNT 语句中,但我并没有真正看到区别,例如:acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '3 years'acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '3 years'

我尝试在我的 SELECT 语句中添加 INTERVAL

SELECT companies.category_code,
       COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= INTERVAL '3 years' THEN 1 ELSE NULL END) AS less_than_3_years,
       COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= INTERVAL '5 years' THEN 1 ELSE NULL END) AS between_3_to_5_years,
       COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= INTERVAL '10 years' THEN 1 ELSE NULL END) AS within_10_years,
       COUNT(1) AS total

并从答案查询中删除 INTERVAL

SELECT companies.category_code,
       COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + '3 years'
                       THEN 1 ELSE NULL END) AS acquired_3_yrs,
       COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + '5 years'
                       THEN 1 ELSE NULL END) AS acquired_5_yrs,
       COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + '10 years'
                       THEN 1 ELSE NULL END) AS acquired_10_yrs,
       COUNT(1) AS total

但是结果是一样的

我试图知道acquired_date和founded_date之间的差异结果,看看该值是否可以与INTERVAL进行比较。结果是几天,这对我来说很有希望。 The result

我尽力提供所有信息供您参考。希望有人能帮忙。提前致谢!

我的建议是add/subtract把INTERVALto/from一个date/time然后和另一个date/time比较。不要减去 date/time 然后与字符串文字进行比较。您的数据库似乎将 '3 YEARS' 理解为 3 * 365 days,而不管 someDateTimesomeDateTime +/- '3 YEARS' 之间的实际天数。年与年的实际天数可能是 365 或 366,具体取决于是否跨越闰年。

这里有一个比较简单的例子,比较具体的区间,也需要知道是否跨越了闰年,跨越了多少闰年。

Fiddle

测试用例:

WITH dates AS (
        SELECT '2021-01-01'::date AS xdate
     )
SELECT xdate - (xdate - INTERVAL '1' YEAR) AS diff
     , xdate - (xdate - INTERVAL '1' YEAR) = '1 YEAR'   AS b1
     , xdate - (xdate - INTERVAL '1' YEAR) = '365 DAYS' AS b2
     , xdate - (xdate - INTERVAL '1' YEAR) = '366 DAYS' AS b3
  FROM dates
;

-- AND --

WITH dates AS (
        SELECT '2021-01-01'::date AS xdate
     )
SELECT xdate - (xdate - INTERVAL '1' YEAR) AS diff
     , xdate - (xdate - INTERVAL '1' YEAR) = INTERVAL '1' YEAR   AS b1
     , xdate - (xdate - INTERVAL '1' YEAR) = INTERVAL '365 DAYS' AS b2
     , xdate - (xdate - INTERVAL '1' YEAR) = INTERVAL '366 DAYS' AS b3
  FROM dates
;

结果:

diff b1 b2 b3
366 days f f t

Fiddle

WITH dates AS (
        SELECT '2021-01-01'::date AS xdate
     )
   , diff AS (
        SELECT xdate - (xdate - INTERVAL '1' YEAR) AS diff
          FROM dates
     )
SELECT diff
     , CASE WHEN diff = (366*24*60*60 * INTERVAL '1' SECOND)
            THEN 1
        END AS compare1
     , 366*24*60*60 AS seconds
     , CASE WHEN diff = (366*24*60*60 * INTERVAL '1' SECOND)
            THEN 1
        END AS compare2
     , CASE WHEN diff = '31622400 SECONDS'
            THEN 1
        END AS compare3
  FROM diff
;

结果:

diff compare1 seconds compare2 compare3
366 days 1 31622400 1 1

原回复:

The fiddle for PostgreSQL

此处(下方)显示的行为与发布的行为类似。

问题是产生的价值不一定是你想的那样。

这是 postgresql 中的一个测试用例,它可能代表您的问题。

这可能与 leap year 有关,一年中的天数不固定。

所以比较日期可能比假设一些天数更安全,这可能是 <= '3 years' 所做的假设。

测试SQL:

WITH test (acquired_at_cleaned, founded_at_clean, n) AS (
        SELECT current_date, current_date - INTERVAL '4' YEAR, 4 UNION
        SELECT current_date, current_date - INTERVAL '3' YEAR, 3 UNION
        SELECT current_date, current_date - INTERVAL '2' YEAR, 2 UNION
        SELECT current_date, current_date - INTERVAL '1' YEAR, 1
     )
   , cases AS (
        SELECT test.*
             , CASE WHEN acquired_at_cleaned <= founded_at_clean::timestamp + INTERVAL '3' year
                    THEN 1 ELSE NULL
                END AS acquired_3_yrs_case1
             , CASE WHEN acquired_at_cleaned - founded_at_clean::timestamp <= '3 year'
                    THEN 1 ELSE NULL
                END AS acquired_3_yrs_case2
             , acquired_at_cleaned - founded_at_clean::timestamp AS x1
             , acquired_at_cleaned - (n * INTERVAL '1' YEAR) AS x2
          FROM test
     )
SELECT acquired_at_cleaned AS acquired
     , founded_at_clean    AS founded
     , n
     , acquired_3_yrs_case1 AS case1
     , acquired_3_yrs_case2 AS case2
     , x1, x2
  FROM cases
 ORDER BY founded_at_clean
;

结果:

acquired founded n case1 case2 x1 x2
2021-12-25 2017-12-25 00:00:00 4 null null 1461 days 2017-12-26 00:00:00
2021-12-25 2018-12-25 00:00:00 3 1 null 1096 days 2018-12-26 00:00:00
2021-12-25 2019-12-25 00:00:00 2 1 1 731 days 2019-12-26 00:00:00
2021-12-25 2020-12-25 00:00:00 1 1 1 365 days 2020-12-26 00:00:00

有趣的结果。