SQL 为什么 dateA - date <= '3 years' 给出的结果与 date <= date + '3 years' 不同
SQL why does dateA - dateB <= '3 years' give a different result than dateA <= dateB + '3 years'
我正在做一个 MODE.com SQL 关于日期格式的练习题。
练习题是:写一个查询,分别统计成立后3年、5年、10年内收购的公司数量(分3列)。还包括一列收购的公司总数。按类别分组并限制为仅包含创建日期的行。
它使用两个 tables:
tutorial.crunchbase_companies_clean_date
table,包括所有公司的信息,如公司名称、成立年份等
tutorial.crunchbase_acquisitions_clean_date
table,包括所有被收购公司的信息,如被收购公司名称、收购日期等
我的代码是:
SELECT companies.category_code,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '3 years' THEN 1 ELSE NULL END) AS less_than_3_years,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '5 years' THEN 1 ELSE NULL END) AS between_3_to_5_years,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '10 years' THEN 1 ELSE NULL END) AS within_10_years,
COUNT(1) AS total
FROM tutorial.crunchbase_companies_clean_date companies
JOIN tutorial.crunchbase_acquisitions_clean_date acq
ON companies.permalink = acq.company_permalink
WHERE companies.founded_at_clean IS NOT NULL
GROUP BY 1
ORDER BY total DESC
结果是:
My result
答案查询是:
SELECT companies.category_code,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '3 years'
THEN 1 ELSE NULL END) AS acquired_3_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '5 years'
THEN 1 ELSE NULL END) AS acquired_5_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '10 years'
THEN 1 ELSE NULL END) AS acquired_10_yrs,
COUNT(1) AS total
FROM tutorial.crunchbase_companies_clean_date companies
JOIN tutorial.crunchbase_acquisitions_clean_date acquisitions
ON acquisitions.company_permalink = companies.permalink
WHERE founded_at_clean IS NOT NULL
GROUP BY 1
ORDER BY 5 DESC
结果是:
The answer result
您可以在屏幕截图中看到结果非常相似,但有些数字不同。
我的查询和答案之间的唯一区别是在 COUNT 语句中,但我并没有真正看到区别,例如:acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '3 years'
和 acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '3 years'
我尝试在我的 SELECT 语句中添加 INTERVAL
:
SELECT companies.category_code,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= INTERVAL '3 years' THEN 1 ELSE NULL END) AS less_than_3_years,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= INTERVAL '5 years' THEN 1 ELSE NULL END) AS between_3_to_5_years,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= INTERVAL '10 years' THEN 1 ELSE NULL END) AS within_10_years,
COUNT(1) AS total
并从答案查询中删除 INTERVAL
:
SELECT companies.category_code,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + '3 years'
THEN 1 ELSE NULL END) AS acquired_3_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + '5 years'
THEN 1 ELSE NULL END) AS acquired_5_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + '10 years'
THEN 1 ELSE NULL END) AS acquired_10_yrs,
COUNT(1) AS total
但是结果是一样的
我试图知道acquired_date和founded_date之间的差异结果,看看该值是否可以与INTERVAL
进行比较。结果是几天,这对我来说很有希望。
The result
我尽力提供所有信息供您参考。希望有人能帮忙。提前致谢!
我的建议是add/subtract把INTERVALto/from一个date/time然后和另一个date/time比较。不要减去 date/time 然后与字符串文字进行比较。您的数据库似乎将 '3 YEARS'
理解为 3 * 365 days
,而不管 someDateTime
和 someDateTime +/- '3 YEARS'
之间的实际天数。年与年的实际天数可能是 365 或 366,具体取决于是否跨越闰年。
这里有一个比较简单的例子,比较具体的区间,也需要知道是否跨越了闰年,跨越了多少闰年。
测试用例:
WITH dates AS (
SELECT '2021-01-01'::date AS xdate
)
SELECT xdate - (xdate - INTERVAL '1' YEAR) AS diff
, xdate - (xdate - INTERVAL '1' YEAR) = '1 YEAR' AS b1
, xdate - (xdate - INTERVAL '1' YEAR) = '365 DAYS' AS b2
, xdate - (xdate - INTERVAL '1' YEAR) = '366 DAYS' AS b3
FROM dates
;
-- AND --
WITH dates AS (
SELECT '2021-01-01'::date AS xdate
)
SELECT xdate - (xdate - INTERVAL '1' YEAR) AS diff
, xdate - (xdate - INTERVAL '1' YEAR) = INTERVAL '1' YEAR AS b1
, xdate - (xdate - INTERVAL '1' YEAR) = INTERVAL '365 DAYS' AS b2
, xdate - (xdate - INTERVAL '1' YEAR) = INTERVAL '366 DAYS' AS b3
FROM dates
;
结果:
diff
b1
b2
b3
366 days
f
f
t
WITH dates AS (
SELECT '2021-01-01'::date AS xdate
)
, diff AS (
SELECT xdate - (xdate - INTERVAL '1' YEAR) AS diff
FROM dates
)
SELECT diff
, CASE WHEN diff = (366*24*60*60 * INTERVAL '1' SECOND)
THEN 1
END AS compare1
, 366*24*60*60 AS seconds
, CASE WHEN diff = (366*24*60*60 * INTERVAL '1' SECOND)
THEN 1
END AS compare2
, CASE WHEN diff = '31622400 SECONDS'
THEN 1
END AS compare3
FROM diff
;
结果:
diff
compare1
seconds
compare2
compare3
366 days
1
31622400
1
1
原回复:
此处(下方)显示的行为与发布的行为类似。
问题是产生的价值不一定是你想的那样。
这是 postgresql 中的一个测试用例,它可能代表您的问题。
这可能与 leap year
有关,一年中的天数不固定。
所以比较日期可能比假设一些天数更安全,这可能是 <= '3 years'
所做的假设。
测试SQL:
WITH test (acquired_at_cleaned, founded_at_clean, n) AS (
SELECT current_date, current_date - INTERVAL '4' YEAR, 4 UNION
SELECT current_date, current_date - INTERVAL '3' YEAR, 3 UNION
SELECT current_date, current_date - INTERVAL '2' YEAR, 2 UNION
SELECT current_date, current_date - INTERVAL '1' YEAR, 1
)
, cases AS (
SELECT test.*
, CASE WHEN acquired_at_cleaned <= founded_at_clean::timestamp + INTERVAL '3' year
THEN 1 ELSE NULL
END AS acquired_3_yrs_case1
, CASE WHEN acquired_at_cleaned - founded_at_clean::timestamp <= '3 year'
THEN 1 ELSE NULL
END AS acquired_3_yrs_case2
, acquired_at_cleaned - founded_at_clean::timestamp AS x1
, acquired_at_cleaned - (n * INTERVAL '1' YEAR) AS x2
FROM test
)
SELECT acquired_at_cleaned AS acquired
, founded_at_clean AS founded
, n
, acquired_3_yrs_case1 AS case1
, acquired_3_yrs_case2 AS case2
, x1, x2
FROM cases
ORDER BY founded_at_clean
;
结果:
acquired
founded
n
case1
case2
x1
x2
2021-12-25
2017-12-25 00:00:00
4
null
null
1461 days
2017-12-26 00:00:00
2021-12-25
2018-12-25 00:00:00
3
1
null
1096 days
2018-12-26 00:00:00
2021-12-25
2019-12-25 00:00:00
2
1
1
731 days
2019-12-26 00:00:00
2021-12-25
2020-12-25 00:00:00
1
1
1
365 days
2020-12-26 00:00:00
有趣的结果。
我正在做一个 MODE.com SQL 关于日期格式的练习题。
练习题是:写一个查询,分别统计成立后3年、5年、10年内收购的公司数量(分3列)。还包括一列收购的公司总数。按类别分组并限制为仅包含创建日期的行。
它使用两个 tables:
tutorial.crunchbase_companies_clean_date
table,包括所有公司的信息,如公司名称、成立年份等
tutorial.crunchbase_acquisitions_clean_date
table,包括所有被收购公司的信息,如被收购公司名称、收购日期等
我的代码是:
SELECT companies.category_code,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '3 years' THEN 1 ELSE NULL END) AS less_than_3_years,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '5 years' THEN 1 ELSE NULL END) AS between_3_to_5_years,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '10 years' THEN 1 ELSE NULL END) AS within_10_years,
COUNT(1) AS total
FROM tutorial.crunchbase_companies_clean_date companies
JOIN tutorial.crunchbase_acquisitions_clean_date acq
ON companies.permalink = acq.company_permalink
WHERE companies.founded_at_clean IS NOT NULL
GROUP BY 1
ORDER BY total DESC
结果是: My result
答案查询是:
SELECT companies.category_code,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '3 years'
THEN 1 ELSE NULL END) AS acquired_3_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '5 years'
THEN 1 ELSE NULL END) AS acquired_5_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '10 years'
THEN 1 ELSE NULL END) AS acquired_10_yrs,
COUNT(1) AS total
FROM tutorial.crunchbase_companies_clean_date companies
JOIN tutorial.crunchbase_acquisitions_clean_date acquisitions
ON acquisitions.company_permalink = companies.permalink
WHERE founded_at_clean IS NOT NULL
GROUP BY 1
ORDER BY 5 DESC
结果是: The answer result
您可以在屏幕截图中看到结果非常相似,但有些数字不同。
我的查询和答案之间的唯一区别是在 COUNT 语句中,但我并没有真正看到区别,例如:acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '3 years'
和 acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '3 years'
我尝试在我的 SELECT 语句中添加 INTERVAL
:
SELECT companies.category_code,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= INTERVAL '3 years' THEN 1 ELSE NULL END) AS less_than_3_years,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= INTERVAL '5 years' THEN 1 ELSE NULL END) AS between_3_to_5_years,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= INTERVAL '10 years' THEN 1 ELSE NULL END) AS within_10_years,
COUNT(1) AS total
并从答案查询中删除 INTERVAL
:
SELECT companies.category_code,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + '3 years'
THEN 1 ELSE NULL END) AS acquired_3_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + '5 years'
THEN 1 ELSE NULL END) AS acquired_5_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + '10 years'
THEN 1 ELSE NULL END) AS acquired_10_yrs,
COUNT(1) AS total
但是结果是一样的
我试图知道acquired_date和founded_date之间的差异结果,看看该值是否可以与INTERVAL
进行比较。结果是几天,这对我来说很有希望。
The result
我尽力提供所有信息供您参考。希望有人能帮忙。提前致谢!
我的建议是add/subtract把INTERVALto/from一个date/time然后和另一个date/time比较。不要减去 date/time 然后与字符串文字进行比较。您的数据库似乎将 '3 YEARS'
理解为 3 * 365 days
,而不管 someDateTime
和 someDateTime +/- '3 YEARS'
之间的实际天数。年与年的实际天数可能是 365 或 366,具体取决于是否跨越闰年。
这里有一个比较简单的例子,比较具体的区间,也需要知道是否跨越了闰年,跨越了多少闰年。
测试用例:
WITH dates AS (
SELECT '2021-01-01'::date AS xdate
)
SELECT xdate - (xdate - INTERVAL '1' YEAR) AS diff
, xdate - (xdate - INTERVAL '1' YEAR) = '1 YEAR' AS b1
, xdate - (xdate - INTERVAL '1' YEAR) = '365 DAYS' AS b2
, xdate - (xdate - INTERVAL '1' YEAR) = '366 DAYS' AS b3
FROM dates
;
-- AND --
WITH dates AS (
SELECT '2021-01-01'::date AS xdate
)
SELECT xdate - (xdate - INTERVAL '1' YEAR) AS diff
, xdate - (xdate - INTERVAL '1' YEAR) = INTERVAL '1' YEAR AS b1
, xdate - (xdate - INTERVAL '1' YEAR) = INTERVAL '365 DAYS' AS b2
, xdate - (xdate - INTERVAL '1' YEAR) = INTERVAL '366 DAYS' AS b3
FROM dates
;
结果:
diff | b1 | b2 | b3 |
---|---|---|---|
366 days | f | f | t |
WITH dates AS (
SELECT '2021-01-01'::date AS xdate
)
, diff AS (
SELECT xdate - (xdate - INTERVAL '1' YEAR) AS diff
FROM dates
)
SELECT diff
, CASE WHEN diff = (366*24*60*60 * INTERVAL '1' SECOND)
THEN 1
END AS compare1
, 366*24*60*60 AS seconds
, CASE WHEN diff = (366*24*60*60 * INTERVAL '1' SECOND)
THEN 1
END AS compare2
, CASE WHEN diff = '31622400 SECONDS'
THEN 1
END AS compare3
FROM diff
;
结果:
diff | compare1 | seconds | compare2 | compare3 |
---|---|---|---|---|
366 days | 1 | 31622400 | 1 | 1 |
原回复:
此处(下方)显示的行为与发布的行为类似。
问题是产生的价值不一定是你想的那样。
这是 postgresql 中的一个测试用例,它可能代表您的问题。
这可能与 leap year
有关,一年中的天数不固定。
所以比较日期可能比假设一些天数更安全,这可能是 <= '3 years'
所做的假设。
测试SQL:
WITH test (acquired_at_cleaned, founded_at_clean, n) AS (
SELECT current_date, current_date - INTERVAL '4' YEAR, 4 UNION
SELECT current_date, current_date - INTERVAL '3' YEAR, 3 UNION
SELECT current_date, current_date - INTERVAL '2' YEAR, 2 UNION
SELECT current_date, current_date - INTERVAL '1' YEAR, 1
)
, cases AS (
SELECT test.*
, CASE WHEN acquired_at_cleaned <= founded_at_clean::timestamp + INTERVAL '3' year
THEN 1 ELSE NULL
END AS acquired_3_yrs_case1
, CASE WHEN acquired_at_cleaned - founded_at_clean::timestamp <= '3 year'
THEN 1 ELSE NULL
END AS acquired_3_yrs_case2
, acquired_at_cleaned - founded_at_clean::timestamp AS x1
, acquired_at_cleaned - (n * INTERVAL '1' YEAR) AS x2
FROM test
)
SELECT acquired_at_cleaned AS acquired
, founded_at_clean AS founded
, n
, acquired_3_yrs_case1 AS case1
, acquired_3_yrs_case2 AS case2
, x1, x2
FROM cases
ORDER BY founded_at_clean
;
结果:
acquired | founded | n | case1 | case2 | x1 | x2 |
---|---|---|---|---|---|---|
2021-12-25 | 2017-12-25 00:00:00 | 4 | null | null | 1461 days | 2017-12-26 00:00:00 |
2021-12-25 | 2018-12-25 00:00:00 | 3 | 1 | null | 1096 days | 2018-12-26 00:00:00 |
2021-12-25 | 2019-12-25 00:00:00 | 2 | 1 | 1 | 731 days | 2019-12-26 00:00:00 |
2021-12-25 | 2020-12-25 00:00:00 | 1 | 1 | 1 | 365 days | 2020-12-26 00:00:00 |
有趣的结果。