直接为时间戳应用 'WHERE' 和应用 'EXTRACT' 显示不同的结果(附代码)

When applying 'WHERE' for timestamp directly and applying 'EXTRACT' showed different results (Codes Attached)

根据任务 5 的 Kaggle 练习(编写查询): https://www.kaggle.com/code/setthawutkulsrisuwan/exercise-as-with 我回答了两种方式:

  1. Query with WHERE EXTRACT() to get year and month and the answer is INCORRECT.:
           WITH RelevantRides AS
           (
               SELECT EXTRACT(HOUR from trip_start_timestamp) as hour_of_day, trip_seconds, trip_miles
               FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
               WHERE EXTRACT(YEAR from trip_start_timestamp) = 2017 AND
                     EXTRACT(MONTH from trip_start_timestamp) BETWEEN 1 and 6 AND
                     trip_seconds > 0 AND
                     trip_miles > 0
           )
           SELECT hour_of_day,
                  COUNT(1) as num_trips,
                  3600 * SUM(trip_miles) / SUM(trip_seconds) as avg_mph
           FROM RelevantRides
           GROUP BY hour_of_day
           ORDER BY hour_of_day
  1. Query with the direct column name to get year and month and the answer is CORRECT.:
               WITH RelevantRides AS
           (
               SELECT EXTRACT(HOUR from trip_start_timestamp) AS hour_of_day, trip_seconds, trip_miles
               FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
               WHERE trip_start_timestamp > '2017-01-01' AND
                     trip_start_timestamp < '2017-07-01' AND
                     trip_seconds > 0 AND
                     trip_miles > 0
           )
           SELECT hour_of_day,
                  COUNT(1) as num_trips,
                  3600 * SUM(trip_miles) / SUM(trip_seconds) as avg_mph
           FROM RelevantRides
           GROUP BY hour_of_day
           ORDER BY hour_of_day

主要区别在于第一个是

WHERE EXTRACT(YEAR from trip_start_timestamp) = 2017 
AND EXTRACT(MONTH from trip_start_timestamp) BETWEEN 1 and 6

,第二个是

WHERE trip_start_timestamp > '2017-01-01' AND
trip_start_timestamp < '2017-07-01' 

.

在我看来,它们应该与使用 EXTRACT() 查询显示 2017 年和 1 到 6 月的查询结果相同,与使用直接列名查询相同;但是,结果并不相同。

请解释这些背后的原因。 谢谢。

您正在将固定日期与时间戳进行比较。恒定日期实际上是时间戳,看起来像 2022-04-07 00:00:00.

因此,当您想要获取一月到六月日期范围内的所有记录时,您需要:

WHERE trip_start_timestamp >= '2017-01-01' 
  AND trip_start_timestamp <  '2017-07-01'  

换句话说,您需要范围第一天午夜或之后的所有内容,以及最后一天次日午夜之前但不包括午夜的所有内容。在数学符号中,您希望日期在 [2017-01-01、2017-07-01) 范围内。范围的起点是封闭的,终点是开放的。

你的代码给出了正确的结果。

WHERE EXTRACT(YEAR from trip_start_timestamp) = 2017 
AND EXTRACT(MONTH from trip_start_timestamp) BETWEEN 1 and 6

但它无法利用您的 trip_start_timestamp 列上的索引,因此在生产中效率不高。