如何将 GTFS-RT 旅行更新数据转换为数据帧?
How to convert GTFS-RT Trip Updates data to a dataframe?
我已经使用以下代码以字典格式下载了一些 GTFS-RT 旅行更新数据:
from google.transit import gtfs_realtime_pb2
import requests
import pandas as pd
feed = gtfs_realtime_pb2.FeedMessage()
# requests will fetch the results from a url, in this case, the positions of all buses
response = requests.get('link')
feed.ParseFromString(response.content)
# Use the data as a dict
from protobuf_to_dict import protobuf_to_dict
# convert to dict from our original protobuf feed
buses_dict = protobuf_to_dict(feed)
输出字典是一个有很多嵌套字典的字典。一辆公交车的行程更新格式如下:
id: "14010512942203036"
trip_update {
trip {
trip_id: "14010000550082549"
start_date: "20210120"
schedule_relationship: SCHEDULED
}
stop_time_update {
stop_sequence: 24
arrival {
delay: -20
time: 1611145420
uncertainty: 0
}
departure {
delay: 52
time: 1611145492
uncertainty: 0
}
stop_id: "9022001005006001"
}
stop_time_update {
stop_sequence: 25
arrival {
delay: 52
time: 1611146092
}
departure {
delay: 52
time: 1611146092
}
stop_id: "9022001005007002"
}
vehicle {
id: "9031001004002234"
}
timestamp: 1611145514
}
您是否知道如何将这些数据转换为更有用的格式?假设 pandas 数据框。
提前致谢!
我用这个 url 进行测试:
url = 'https://cdn.mbta.com/realtime/VehiclePositions.pb'
您需要做的就是将此行添加到 pandas 数据帧的脚本末尾
pd.json_normalize(buses_dict['entity'])
它将把这本字典分成这些列
Index(['id', 'vehicle.trip.trip_id', 'vehicle.trip.start_time',
'vehicle.trip.start_date', 'vehicle.trip.schedule_relationship',
'vehicle.trip.route_id', 'vehicle.trip.direction_id',
'vehicle.position.latitude', 'vehicle.position.longitude',
'vehicle.position.bearing', 'vehicle.current_stop_sequence',
'vehicle.current_status', 'vehicle.timestamp', 'vehicle.stop_id',
'vehicle.vehicle.id', 'vehicle.vehicle.label',
'vehicle.occupancy_status', 'vehicle.position.speed'],
dtype='object')
我已经使用以下代码以字典格式下载了一些 GTFS-RT 旅行更新数据:
from google.transit import gtfs_realtime_pb2
import requests
import pandas as pd
feed = gtfs_realtime_pb2.FeedMessage()
# requests will fetch the results from a url, in this case, the positions of all buses
response = requests.get('link')
feed.ParseFromString(response.content)
# Use the data as a dict
from protobuf_to_dict import protobuf_to_dict
# convert to dict from our original protobuf feed
buses_dict = protobuf_to_dict(feed)
输出字典是一个有很多嵌套字典的字典。一辆公交车的行程更新格式如下:
id: "14010512942203036"
trip_update {
trip {
trip_id: "14010000550082549"
start_date: "20210120"
schedule_relationship: SCHEDULED
}
stop_time_update {
stop_sequence: 24
arrival {
delay: -20
time: 1611145420
uncertainty: 0
}
departure {
delay: 52
time: 1611145492
uncertainty: 0
}
stop_id: "9022001005006001"
}
stop_time_update {
stop_sequence: 25
arrival {
delay: 52
time: 1611146092
}
departure {
delay: 52
time: 1611146092
}
stop_id: "9022001005007002"
}
vehicle {
id: "9031001004002234"
}
timestamp: 1611145514
}
您是否知道如何将这些数据转换为更有用的格式?假设 pandas 数据框。
提前致谢!
我用这个 url 进行测试:
url = 'https://cdn.mbta.com/realtime/VehiclePositions.pb'
您需要做的就是将此行添加到 pandas 数据帧的脚本末尾
pd.json_normalize(buses_dict['entity'])
它将把这本字典分成这些列
Index(['id', 'vehicle.trip.trip_id', 'vehicle.trip.start_time',
'vehicle.trip.start_date', 'vehicle.trip.schedule_relationship',
'vehicle.trip.route_id', 'vehicle.trip.direction_id',
'vehicle.position.latitude', 'vehicle.position.longitude',
'vehicle.position.bearing', 'vehicle.current_stop_sequence',
'vehicle.current_status', 'vehicle.timestamp', 'vehicle.stop_id',
'vehicle.vehicle.id', 'vehicle.vehicle.label',
'vehicle.occupancy_status', 'vehicle.position.speed'],
dtype='object')