获取本地化日期的毫秒数,考虑到夏令时
Get number of milliseconds for a localised date, taking into account daylight savings
我在 Google BigQuery 中有这样的数据:
sample_date_time_UTC time_zone milliseconds_between_samples
-------- --------- ----------------------------
2019-03-31 01:06:03 UTC Europe/Paris 60000
2019-03-31 01:16:03 UTC Europe/Paris 60000
...
预计数据样本会定期出现,由 milliseconds_between_samples
字段的值指示:
time_zone
是一个代表 Google 云的字符串 Supported Timezone Value
然后我将检查任何特定日期的实际样本数与预期样本数的比率,对于任何一天的范围(表示为本地日期,对于给定的 time_zone
):
with data as
(
select
-- convert sample_date_time_UTC to equivalent local datetime for the timezone
DATETIME(sample_date_time_UTC,time_zone) as localised_sample_date_time,
milliseconds_between_samples
from `mytable`
where sample_date_time between '2019-03-31 00:00:00.000000+01:00' and '2019-04-01 00:00:00.000000+02:00'
)
select date(localised_sample_date_time) as localised_date, count(*)/(86400000/avg(milliseconds_between_samples)) as ratio_of_daily_sample_count_to_expected
from data
group by localised_date
order by localised_date
问题是这有一个错误,因为我已经将一天中的预期毫秒数硬编码为 86400000
。这是不正确的,因为当夏令时在指定的 time_zone
(Europe/Paris
) 开始时,一天会缩短 1 小时。夏令时结束时,一天会延长 1 小时。
所以,上面的查询是不正确的。它查询 Europe/Paris
时区今年 3 月 31 日的数据(这是该时区开始夏令时的时间)。那天的毫秒应该是82800000
.
在查询中,如何获得指定 localised_date
的正确毫秒数?
更新:
我试过这样做是为了看看效果如何 returns:
select DATETIME_DIFF(DATETIME('2019-04-01 00:00:00.000000+02:00', 'Europe/Paris'), DATETIME('2019-03-31 00:00:00.000000+01:00', 'Europe/Paris'), MILLISECOND)
那没用 - 我得到 86400000
您可以通过删除 +01:00
和 +02:00
来获得两个时间戳的毫秒差值。请注意,这给出了 UTC 时间戳之间的差异:90000000
,这与实际经过的毫秒数不同。
您可以像这样获取一天的毫秒数:
select 86400000 + (86400000 - DATETIME_DIFF(DATETIME('2019-04-01 00:00:00.000000', 'Europe/Paris'), DATETIME('2019-03-31 00:00:00.000000', 'Europe/Paris'), MILLISECOND))
感谢@Juta,关于使用 UTC 时间进行计算的提示。当我按本地化日期对每天的数据进行分组时,我发现我可以通过获取 'localised' 日期的开始和结束日期时间(UTC 格式)来计算每天的毫秒数,使用以下逻辑:
-- get UTC start datetime for localised date
-- get UTC end datetime for localised date
-- this then gives the milliseconds for that localised date:
datetime_diff(utc_end_datetime, utc_start_datetime, MILLISECOND);
因此,我的完整查询变为:
with daily_sample_count as (
with data as
(
select
-- get the date in the local timezone, for sample_date_time_UTC
DATE(sample_date_time_UTC,time_zone) as localised_date,
milliseconds_between_samples
from `mytable`
where sample_date_time between '2019-03-31 00:00:00.000000+01:00' and '2019-04-01 00:00:00.000000+02:00'
)
select
localised_date,
count(*) as daily_record_count,
avg(milliseconds_between_samples) as daily_avg_millis_between_samples,
datetime(timestamp(localised_date, time_zone)) as utc_start_datetime,
datetime(timestamp(date_add(localised_date, interval 1 day), time_zone)) as utc_end_datetime
from data
)
select
localised_date,
-- apply calculation for ratio_of_daily_sample_count_to_expected
-- based on the actual vs expected number of samples for the day
-- no. of milliseconds in the day changes, when transitioning in/out of daylight saving - so we calculate milliseconds in the day
daily_record_count/(datetime_diff(utc_end_datetime, utc_start_datetime, MILLISECOND)/daily_avg_millis_between_samples) as ratio_of_daily_sample_count_to_expected
from
daily_sample_count
我在 Google BigQuery 中有这样的数据:
sample_date_time_UTC time_zone milliseconds_between_samples
-------- --------- ----------------------------
2019-03-31 01:06:03 UTC Europe/Paris 60000
2019-03-31 01:16:03 UTC Europe/Paris 60000
...
预计数据样本会定期出现,由 milliseconds_between_samples
字段的值指示:
time_zone
是一个代表 Google 云的字符串 Supported Timezone Value
然后我将检查任何特定日期的实际样本数与预期样本数的比率,对于任何一天的范围(表示为本地日期,对于给定的 time_zone
):
with data as
(
select
-- convert sample_date_time_UTC to equivalent local datetime for the timezone
DATETIME(sample_date_time_UTC,time_zone) as localised_sample_date_time,
milliseconds_between_samples
from `mytable`
where sample_date_time between '2019-03-31 00:00:00.000000+01:00' and '2019-04-01 00:00:00.000000+02:00'
)
select date(localised_sample_date_time) as localised_date, count(*)/(86400000/avg(milliseconds_between_samples)) as ratio_of_daily_sample_count_to_expected
from data
group by localised_date
order by localised_date
问题是这有一个错误,因为我已经将一天中的预期毫秒数硬编码为 86400000
。这是不正确的,因为当夏令时在指定的 time_zone
(Europe/Paris
) 开始时,一天会缩短 1 小时。夏令时结束时,一天会延长 1 小时。
所以,上面的查询是不正确的。它查询 Europe/Paris
时区今年 3 月 31 日的数据(这是该时区开始夏令时的时间)。那天的毫秒应该是82800000
.
在查询中,如何获得指定 localised_date
的正确毫秒数?
更新:
我试过这样做是为了看看效果如何 returns:
select DATETIME_DIFF(DATETIME('2019-04-01 00:00:00.000000+02:00', 'Europe/Paris'), DATETIME('2019-03-31 00:00:00.000000+01:00', 'Europe/Paris'), MILLISECOND)
那没用 - 我得到 86400000
您可以通过删除 +01:00
和 +02:00
来获得两个时间戳的毫秒差值。请注意,这给出了 UTC 时间戳之间的差异:90000000
,这与实际经过的毫秒数不同。
您可以像这样获取一天的毫秒数:
select 86400000 + (86400000 - DATETIME_DIFF(DATETIME('2019-04-01 00:00:00.000000', 'Europe/Paris'), DATETIME('2019-03-31 00:00:00.000000', 'Europe/Paris'), MILLISECOND))
感谢@Juta,关于使用 UTC 时间进行计算的提示。当我按本地化日期对每天的数据进行分组时,我发现我可以通过获取 'localised' 日期的开始和结束日期时间(UTC 格式)来计算每天的毫秒数,使用以下逻辑:
-- get UTC start datetime for localised date
-- get UTC end datetime for localised date
-- this then gives the milliseconds for that localised date:
datetime_diff(utc_end_datetime, utc_start_datetime, MILLISECOND);
因此,我的完整查询变为:
with daily_sample_count as (
with data as
(
select
-- get the date in the local timezone, for sample_date_time_UTC
DATE(sample_date_time_UTC,time_zone) as localised_date,
milliseconds_between_samples
from `mytable`
where sample_date_time between '2019-03-31 00:00:00.000000+01:00' and '2019-04-01 00:00:00.000000+02:00'
)
select
localised_date,
count(*) as daily_record_count,
avg(milliseconds_between_samples) as daily_avg_millis_between_samples,
datetime(timestamp(localised_date, time_zone)) as utc_start_datetime,
datetime(timestamp(date_add(localised_date, interval 1 day), time_zone)) as utc_end_datetime
from data
)
select
localised_date,
-- apply calculation for ratio_of_daily_sample_count_to_expected
-- based on the actual vs expected number of samples for the day
-- no. of milliseconds in the day changes, when transitioning in/out of daylight saving - so we calculate milliseconds in the day
daily_record_count/(datetime_diff(utc_end_datetime, utc_start_datetime, MILLISECOND)/daily_avg_millis_between_samples) as ratio_of_daily_sample_count_to_expected
from
daily_sample_count