将日期系列减少到 BigQuery 中的最小表示

Reducing Series of Dates to minimal representation in BigQuery

如果我table喜欢:

start_date|end_date
1/1/2018|1/5/2018
1/4/2018|1/10/2018
1/9/2018|1/22/2018
2/1/2018|2/1/2018
1/31/2018|2/5/2018

我想获取这些行涵盖的所有日期范围。所以我想要一些返回的东西,比如:

1/1/2018|1/22/2018
1/31/2018|2/5/2018

BigQuery 中是否有可以处理此问题的函数?

没有这样的功能 - 但您可以尝试下面的方法(BigQuery Standard SQL)

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '1/1/2018' start_date, '1/5/2018' end_date UNION ALL
  SELECT '1/4/2018', '1/10/2018' UNION ALL
  SELECT '1/9/2018', '1/22/2018' UNION ALL
  SELECT '2/1/2018', '2/1/2018' UNION ALL
  SELECT '1/31/2018', '2/5/2018' 
), parsed_as_dates AS (
  SELECT PARSE_DATE('%m/%d/%Y', start_date) start_date, PARSE_DATE('%m/%d/%Y', end_date) end_date
  FROM `project.dataset.table`
), days AS (
  SELECT day FROM 
  (SELECT MIN(start_date) min_date, MAX(end_date) max_date FROM parsed_as_dates), 
  UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) day
), temp AS (
  SELECT day, SIGN(COUNTIF(day BETWEEN start_date AND end_date)) flag
  FROM days CROSS JOIN parsed_as_dates GROUP BY day
)
SELECT MIN(day) start_date, MAX(day) end_date
FROM (
  SELECT day, flag, SUM(start) OVER(ORDER BY day) grp
  FROM (
    SELECT day, flag, ABS(flag - IFNULL(LAG(flag) OVER(ORDER BY day), 0)) start
    FROM temp
  )
)
WHERE flag = 1
GROUP BY grp
-- ORDER BY start_date

结果如下

Row start_date  end_date     
1   2018-01-01  2018-01-22   
2   2018-01-31  2018-02-05    

只是 "quick" 的想法 - 你可能想要稍微重构它 - 因为它对我来说看起来有点过度设计 :o) 但至少它的工作