在 SQL 查询中计算随后两个月的不同行程
calculate different trips over two subsequent month in a SQL query
我有一个简单的 table,其中包含不同日期的旅行。
trip_id
start_date
end_date
160320
2017-12-31 20:40:25 UTC
2017-12-31 20:45:25 UTC
160321
2018-01-12 21:01:51 UTC
2018-01-12 22:01:51 UTC
我只想创建一个显示这些字段的 SQL 查询。
- 年
- 月
- trips_this_month,
- trips_previous_month
- difference_from_previous_month (count_this_month - count_previous_month)
- is_increased(是一个布尔值列,如果我们看到增加则为真,为假
除此以外)
更新:
我可以集中精力编写一个简单的查询来获取它们,但我仍然觉得我可以优化这个查询。任何帮助将不胜感激。
SELECT
year,
month,
trips_this_month,
trips_previous_month,
case when difference_from_previous_month < 0 then false else true end as is_increased
FROM
(SELECT
year,
month,
number_of_trips AS trips_this_month,
LAG(number_of_trips,1,0) over (order by year,month) AS trips_previous_month,
number_of_trips - LAG(number_of_trips,1,0) OVER(order by year,month) AS difference_from_previous_month,
FROM(
SELECT EXTRACT(Month FROM start_date) AS month,
EXTRACT(Year FROM start_date) AS year,
COUNT(*) as number_of_trips
FROM a_table
group by month ,year
)
order by year, month
limit 100
)
但我还是忍不住想做更多。感谢您进一步帮助完成它。
这是一个 MySQL 答案,是在 OP 更改标签之前发布的。
我不使用 Bigquery,所以我不确定我的答案在用于 Bigquery 之前需要调整多少。我所知道的是我已经测试了 OP 的原始查询以及 MySQL 服务器上接受的答案的查询并且它有效所以我假设我建议的 (MySQL) 答案不需要很多调整它以在 Bigquery 中工作。
试试这个:
WITH RECURSIVE cte AS (
SELECT MIN(start_date) minstdt, MAX(start_date) maxstdt FROM mytable
UNION ALL
SELECT minstdt+INTERVAL 1 MONTH, maxstdt FROM cte
WHERE minstdt+INTERVAL 1 MONTH <= maxstdt )
SELECT year,
month,
number_of_trips,
number_of_trips-IFNULL(prev_month_number_of_trips,0) AS This_month_vs_prev_month,
IF(number_of_trips > prev_month_number_of_trips,1,0) AS Is_increased
FROM
(SELECT
YEAR(cte.minstdt) AS year,
MONTH(cte.minstdt) AS month,
SUM(CASE WHEN start_date IS NULL THEN 0 ELSE 1 END) AS number_of_trips,
LAG(SUM(CASE WHEN start_date IS NULL THEN 0 ELSE 1 END))
OVER (ORDER BY YEAR(cte.minstdt), MONTH(cte.minstdt)) AS prev_month_number_of_trips
FROM cte
LEFT JOIN mytable
ON YEAR(cte.minstdt)=YEAR(start_date)
AND MONTH(cte.minstdt)=MONTH(start_date)
GROUP BY year, month) V
ORDER BY year, month;
- 我使用递归通用 table 表达式 (
cte
) 根据 table 的 start_date
中出现的最小和最大日期生成日期。
- 我已经用
YEAR()
替换了 EXTRACT()
并且 MONTH()
函数稍微短了一些。
- 我
LEFT JOIN
cte
数据table.
看看你是否可以使用它。
考虑使用标准化的每月第一天比较当前和之前月份的聚合:
WITH sub AS (
SELECT
DATE_SUB(
DATE_ADD(LAST_DAY(start_date), INTERVAL 1 DAY),
INTERVAL 1 MONTH
) AS month_year,
COUNT(*) AS number_of_trips
FROM a_table
GROUP BY month_year
), calc AS (
SELECT
EXTRACT(YEAR FROM curr.month_year) AS year,
EXTRACT(MONTH FROM curr.month_year) AS month,
COALESCE(curr.number_of_trips, 0) AS trips_this_month,
COALESCE(prev.number_of_trips, 0) AS trips_previous_month
FROM sub AS curr
LEFT JOIN sub AS prev
ON prev.month_year = DATE_SUB(curr.month_year, INTERVAL 1 MONTH)
)
SELECT
year,
month,
trips_this_month,
trips_previous_month,
trips_this_month - trips_previous_month AS difference_from_previous_month,
(trips_this_month - trips_previous_month) > 0 AS is_increased
FROM calc
我有一个简单的 table,其中包含不同日期的旅行。
trip_id | start_date | end_date |
---|---|---|
160320 | 2017-12-31 20:40:25 UTC | 2017-12-31 20:45:25 UTC |
160321 | 2018-01-12 21:01:51 UTC | 2018-01-12 22:01:51 UTC |
我只想创建一个显示这些字段的 SQL 查询。
- 年
- 月
- trips_this_month,
- trips_previous_month
- difference_from_previous_month (count_this_month - count_previous_month)
- is_increased(是一个布尔值列,如果我们看到增加则为真,为假 除此以外) 更新: 我可以集中精力编写一个简单的查询来获取它们,但我仍然觉得我可以优化这个查询。任何帮助将不胜感激。
SELECT
year,
month,
trips_this_month,
trips_previous_month,
case when difference_from_previous_month < 0 then false else true end as is_increased
FROM
(SELECT
year,
month,
number_of_trips AS trips_this_month,
LAG(number_of_trips,1,0) over (order by year,month) AS trips_previous_month,
number_of_trips - LAG(number_of_trips,1,0) OVER(order by year,month) AS difference_from_previous_month,
FROM(
SELECT EXTRACT(Month FROM start_date) AS month,
EXTRACT(Year FROM start_date) AS year,
COUNT(*) as number_of_trips
FROM a_table
group by month ,year
)
order by year, month
limit 100
)
但我还是忍不住想做更多。感谢您进一步帮助完成它。
这是一个 MySQL 答案,是在 OP 更改标签之前发布的。
我不使用 Bigquery,所以我不确定我的答案在用于 Bigquery 之前需要调整多少。我所知道的是我已经测试了 OP 的原始查询以及 MySQL 服务器上接受的答案的查询并且它有效所以我假设我建议的 (MySQL) 答案不需要很多调整它以在 Bigquery 中工作。
试试这个:
WITH RECURSIVE cte AS (
SELECT MIN(start_date) minstdt, MAX(start_date) maxstdt FROM mytable
UNION ALL
SELECT minstdt+INTERVAL 1 MONTH, maxstdt FROM cte
WHERE minstdt+INTERVAL 1 MONTH <= maxstdt )
SELECT year,
month,
number_of_trips,
number_of_trips-IFNULL(prev_month_number_of_trips,0) AS This_month_vs_prev_month,
IF(number_of_trips > prev_month_number_of_trips,1,0) AS Is_increased
FROM
(SELECT
YEAR(cte.minstdt) AS year,
MONTH(cte.minstdt) AS month,
SUM(CASE WHEN start_date IS NULL THEN 0 ELSE 1 END) AS number_of_trips,
LAG(SUM(CASE WHEN start_date IS NULL THEN 0 ELSE 1 END))
OVER (ORDER BY YEAR(cte.minstdt), MONTH(cte.minstdt)) AS prev_month_number_of_trips
FROM cte
LEFT JOIN mytable
ON YEAR(cte.minstdt)=YEAR(start_date)
AND MONTH(cte.minstdt)=MONTH(start_date)
GROUP BY year, month) V
ORDER BY year, month;
- 我使用递归通用 table 表达式 (
cte
) 根据 table 的start_date
中出现的最小和最大日期生成日期。 - 我已经用
YEAR()
替换了EXTRACT()
并且MONTH()
函数稍微短了一些。 - 我
LEFT JOIN
cte
数据table.
看看你是否可以使用它。
考虑使用标准化的每月第一天比较当前和之前月份的聚合:
WITH sub AS (
SELECT
DATE_SUB(
DATE_ADD(LAST_DAY(start_date), INTERVAL 1 DAY),
INTERVAL 1 MONTH
) AS month_year,
COUNT(*) AS number_of_trips
FROM a_table
GROUP BY month_year
), calc AS (
SELECT
EXTRACT(YEAR FROM curr.month_year) AS year,
EXTRACT(MONTH FROM curr.month_year) AS month,
COALESCE(curr.number_of_trips, 0) AS trips_this_month,
COALESCE(prev.number_of_trips, 0) AS trips_previous_month
FROM sub AS curr
LEFT JOIN sub AS prev
ON prev.month_year = DATE_SUB(curr.month_year, INTERVAL 1 MONTH)
)
SELECT
year,
month,
trips_this_month,
trips_previous_month,
trips_this_month - trips_previous_month AS difference_from_previous_month,
(trips_this_month - trips_previous_month) > 0 AS is_increased
FROM calc