GTFS 数据库 - SQL "Revenue Mileage" 和 "Revenue Hours" 的查询

GTFS Database - SQL Queries for "Revenue Mileage" and "Revenue Hours"

我正在尝试按日、月和年查找 "Route" 的收入 miles/kilometers;通过查询具有此处描述的结构的 GTFS 数据库:

https://developers.google.com/transit/gtfs/reference

并在此处查看非常清晰的结构草图:

http://blog.openplans.org/2012/08/the-openplans-guide-to-gtfs-data/

"Revenue distance traveled" definition:

("Available for passengers to use" distance)

The number of miles/kilometers traveled from the first actual bus stop where a passenger can board, to the last drop-off at the last bus stop, for that particular route and bus run. (then aggregated together for all service runs taken by all buses for that particular route)

-

"Revenue hours" definition:

("Available for passengers to use" time span)

The number of hours from the moment the vehicle arrives at the first bus stop, until the moment it drops off its last passenger at the last bus stop. (then aggregated together for all service runs taken by all buses for that particular route)

我正在使用 SQL Server/MSSQL。尽管 SQL Lite,或 MySQL,或任何 SQL 示例都很好。

基本上,我需要能够SELECT一条路线,然后关联routescalendar_datescalendar、[=16=中的数据], stops, 和 trips tables 来查找从第一站开始覆盖了多少 miles/kilometers (stop_timesstops tables) 到最后,经过了多少小时,然后为特定的 service_id(在 tripscalendar tables 中)找到这个,然后也为所有service_ids 用于特定路线,并能够针对特定 date(在 calendar_dates table)或日期跨度(天、月、3 -月、年等)。

如果需要几个不同的查询,那很好。每条路线行驶的收入距离和每条路线的收入小时数可以单独查询。

有没有人曾经这样做过,愿意分享他们为此的查询结构,或者有没有人想出这个?有没有关于如何编写此查询的示例?几个星期以来我一直在网上到处寻找。

这是我创建的数据库的图表图像,其中详细显示了所有关系:

我已经为预定的公里数完成了这个,作者:

  1. 正在通过 GTFS SQL importer and PostGIS
  2. 将 GTFS 加载到数据库中
  3. 使形状 table 空间化
  4. Calculate distance for each shape
  5. 汇总如下(请参阅有关服务 ID 的注释)。

select t.route_id as id, r.route_short_name as route, sum(l.shape_dist/1000) as sched_kms 
from gtfs_shape_lengths l

inner join gtfs_trips t on t.shape_id = l.shape_id
inner join gtfs_routes r on r.route_id = t.route_id
inner join gtfs_calendar c on t.service_id = c.service_id

where c.service_id ilike '%sat%'

group by t.route_id, r.route_short_name

union all

select 'total' as id, 'total_' as name,
sum(l.shape_dist/1000) as sched_kms

from gtfs_shape_lengths l

inner join gtfs_trips t on t.shape_id = l.shape_id
inner join gtfs_calendar c on t.service_id = c.service_id

where c.service_id ilike '%sat%'

order by sched_kms desc

原文在这里: http://transitdata.net/using-gtfs-and-postgis-to-calculate-levels-of-scheduled-service/

好的,我想出了以下方法来获得 服务时间。在我的示例中,stop_times table 中的 arrival_timedeparture_time 列是整数数据类型,其中存储的数字数据表示 "minutes since midnight"(例如“29 小时和45 分钟后午夜”将是“1785 分钟”......午夜是从服务日中午开始测量的,减去 12 小时——根据规范要求。这也是最好的方法。另请注意:我将 trip_date 列添加到 trips table 中,因为我将此 GTFS 数据库用于 operational/internal 联邦报告用途,而不仅仅是用于服务提要public;所以有必要知道个人旅行日期(我不想像某些机构那样为此目的在 calendar_dates 中输入每一天)。此示例适用于 MSSQL/SQL 服务器:

-- FIRST/LAST TRIPS OF THE DAY AND SPAN OF SERVICE

SELECT

    joinedTables.service_id                  AS 'Service Number',
    joinedTables.trip_date                   AS 'Date',
    joinedTables.route_id                    AS 'Route',

    MIN ( joinedTables.starting_departure )  AS 'First Departure in Minutes',
    MAX ( joinedTables.ending_arrival )      AS 'Last Departure in Minutes',

    -- Decimal hours of minutes integers.
    CAST (
                 (
                       (
                             MAX (ending_arrival) - MIN (starting_departure)
                       ) / 60.00
                 ) AS DECIMAL (9, 2)
         )                                 AS 'Service Hours'


FROM
    (
        SELECT
            SelectedTripsColumns.service_id,
            SelectedTripsColumns.trip_id,
            SelectedTripsColumns.route_id,
            SelectedTripsColumns.trip_date,
            MIN (departure_time) AS starting_departure,
            MAX (arrival_time) AS ending_arrival

        FROM
            stop_times AS stopTimesTable

        JOIN (
                 SELECT
                     service_id,
                     trip_id,
                     route_id,
                     trip_date
                 FROM
                     trips
             ) AS SelectedTripsColumns

        ON stopTimesTable.trip_id = SelectedTripsColumns.trip_id


        JOIN routes

        ON SelectedTripsColumns.route_id = routes.route_id


        GROUP BY
            SelectedTripsColumns.service_id,
            SelectedTripsColumns.trip_id,
            SelectedTripsColumns.route_id,
            SelectedTripsColumns.trip_date

    ) AS joinedTables

-- WHERE trip_date = '2015-07-27'

GROUP BY
    service_id,
    route_id,
    trip_date

ORDER BY
    service_id,
    route_id,
    trip_date;