SQL查询计算每个日期的累计行程数
SQL query to calculate the cumulative number of trips for each date
我有一个名为 bikeshare_trips 的配置单元 table,具有以下架构
+---------------------+------------+----------------------------------------------------+--+
| col_name | data_type | comment |
+---------------------+------------+----------------------------------------------------+--+
| trip_id | int | numeric id of bike trip |
| duration_sec | int | time of trip in seconds |
| start_date | string | start date of trip with date and time, in PST |
| start_station_name | string | station name of start station |
| start_station_id | int | numeric reference for start station |
| end_date | string | end date of trip with date and time, in PST |
| end_station_name | string | station name for end station |
| end_station_id | int | numeric reference for end station |
| bike_number | int | id of bike used |
| zip_code | string | Home zip code of subscriber (customers can choose to manually enter zip at kiosk however data is unreliable) |
| subscriber_type | string | Subscriber can be annual or 30-day member, Customer can be 24-hour or 3-day member |
+---------------------+------------+----------------------------------------------------+--+
和一些数据示例
944732 2618 09/24/2015 17:22:00 Mezes 83 09/24/2015 18:06:00 Mezes 83 653 94063 Customer
984595 5957 09/24/2015 18:12:00 Mezes 83 10/25/2015 19:51:00 Mezes 83 52 nil Customer
984596 5913 09/24/2015 18:13:00 Mezes 83 10/25/2015 19:51:00 Mezes 83 121 nil Customer
1129385 6079 09/24/2015 10:33:00 Mezes 83 03/18/2016 12:14:00 Mezes 83 208 94070 Customer
1030383 5780 2015-09-30 10:52:00 Mezes 83 12/06/2015 12:28:00 Mezes 83 44 94064 Customer
1102641 801 02/23/2016 12:25:00 Mezes 83 02/23/2016 12:39:00 Mezes 83 174 93292 Customer
969490 255 2015-09-30 19:02:00 Mezes 83 10/13/2015 19:07:00 Mezes 83 650 94063 Subscriber
1129386 6032 03/18/2016 10:33:00 Mezes 83 03/18/2016 12:13:00 Mezes 83 155 94070 Customer
947105 1008 2015-09-30 12:57:00 Mezes 83 09/26/2015 13:13:00 Mezes 83 157 94063 Subscriber
1011650 60 11/16/2015 18:54:00 Mezes 83 11/16/2015 18:55:00 Mezes 83 35 94124 Subscriber
table每一行对应不同的自行车行程,我想计算2015年每个日期的累计行程
预期输出为
trip_date num_trips cumulative_trips
2015-09-24 4 4
2015-09-30 3 7
2015-11-16 1 8
我正在尝试使用分析函数和子查询,但我不明白,如果有任何帮助,我们将不胜感激,在此先感谢
相关子查询可能是此处的一种选择:
SELECT
trip_date,
num_trips,
(SELECT SUM(t2.num_trips) FROM yourTable t2
WHERE t2.trip_date <= t1.trip_date) AS cumulative_trips
FROM yourTable t1
ORDER BY
trip_date;
您可以使用聚合和 window 函数:
select to_date(UNIX_TIMESTAMP(start_date,"MM/dd/yyyy HH:mm")) as dte, count(*),
sum(count(*)) over (order by min(start_date))
from bikeshare_trips
where YEAR(FROM_UNIXTIME(UNIX_TIMESTAMP(start_date,"MM/dd/yyyy HH:mm"))) = 2015
group by to_date(UNIX_TIMESTAMP(start_date,"MM/dd/yyyy HH:mm"))
order by dte;
您可能需要 Hive 中的子查询:
select dte, cnt, sum(cnt) over (order by dte)
from (select to_date(UNIX_TIMESTAMP(start_date,"MM/dd/yyyy HH:mm")) as dte, count(*) as cnt
from bikeshare_trips
where YEAR(FROM_UNIXTIME(UNIX_TIMESTAMP(start_date,"MM/dd/yyyy HH:mm"))) = 2015
group by to_date(start_date)
) b
order by dte;
我有一个名为 bikeshare_trips 的配置单元 table,具有以下架构
+---------------------+------------+----------------------------------------------------+--+
| col_name | data_type | comment |
+---------------------+------------+----------------------------------------------------+--+
| trip_id | int | numeric id of bike trip |
| duration_sec | int | time of trip in seconds |
| start_date | string | start date of trip with date and time, in PST |
| start_station_name | string | station name of start station |
| start_station_id | int | numeric reference for start station |
| end_date | string | end date of trip with date and time, in PST |
| end_station_name | string | station name for end station |
| end_station_id | int | numeric reference for end station |
| bike_number | int | id of bike used |
| zip_code | string | Home zip code of subscriber (customers can choose to manually enter zip at kiosk however data is unreliable) |
| subscriber_type | string | Subscriber can be annual or 30-day member, Customer can be 24-hour or 3-day member |
+---------------------+------------+----------------------------------------------------+--+
和一些数据示例
944732 2618 09/24/2015 17:22:00 Mezes 83 09/24/2015 18:06:00 Mezes 83 653 94063 Customer
984595 5957 09/24/2015 18:12:00 Mezes 83 10/25/2015 19:51:00 Mezes 83 52 nil Customer
984596 5913 09/24/2015 18:13:00 Mezes 83 10/25/2015 19:51:00 Mezes 83 121 nil Customer
1129385 6079 09/24/2015 10:33:00 Mezes 83 03/18/2016 12:14:00 Mezes 83 208 94070 Customer
1030383 5780 2015-09-30 10:52:00 Mezes 83 12/06/2015 12:28:00 Mezes 83 44 94064 Customer
1102641 801 02/23/2016 12:25:00 Mezes 83 02/23/2016 12:39:00 Mezes 83 174 93292 Customer
969490 255 2015-09-30 19:02:00 Mezes 83 10/13/2015 19:07:00 Mezes 83 650 94063 Subscriber
1129386 6032 03/18/2016 10:33:00 Mezes 83 03/18/2016 12:13:00 Mezes 83 155 94070 Customer
947105 1008 2015-09-30 12:57:00 Mezes 83 09/26/2015 13:13:00 Mezes 83 157 94063 Subscriber
1011650 60 11/16/2015 18:54:00 Mezes 83 11/16/2015 18:55:00 Mezes 83 35 94124 Subscriber
table每一行对应不同的自行车行程,我想计算2015年每个日期的累计行程
预期输出为
trip_date num_trips cumulative_trips
2015-09-24 4 4
2015-09-30 3 7
2015-11-16 1 8
我正在尝试使用分析函数和子查询,但我不明白,如果有任何帮助,我们将不胜感激,在此先感谢
相关子查询可能是此处的一种选择:
SELECT
trip_date,
num_trips,
(SELECT SUM(t2.num_trips) FROM yourTable t2
WHERE t2.trip_date <= t1.trip_date) AS cumulative_trips
FROM yourTable t1
ORDER BY
trip_date;
您可以使用聚合和 window 函数:
select to_date(UNIX_TIMESTAMP(start_date,"MM/dd/yyyy HH:mm")) as dte, count(*),
sum(count(*)) over (order by min(start_date))
from bikeshare_trips
where YEAR(FROM_UNIXTIME(UNIX_TIMESTAMP(start_date,"MM/dd/yyyy HH:mm"))) = 2015
group by to_date(UNIX_TIMESTAMP(start_date,"MM/dd/yyyy HH:mm"))
order by dte;
您可能需要 Hive 中的子查询:
select dte, cnt, sum(cnt) over (order by dte)
from (select to_date(UNIX_TIMESTAMP(start_date,"MM/dd/yyyy HH:mm")) as dte, count(*) as cnt
from bikeshare_trips
where YEAR(FROM_UNIXTIME(UNIX_TIMESTAMP(start_date,"MM/dd/yyyy HH:mm"))) = 2015
group by to_date(start_date)
) b
order by dte;