使用 R 将 GTFS 空间数据从字符转换为数字
Using R to convert GTFS spatial data from character to numeric
我正在关注 gtfstools (https://cran.r-project.org/web/packages/gtfstools/vignettes/gtfstools.html) 的小插图,但我被数据格式卡住了。基本上,我链接到一个 gtfs 数据集,它是一个 zip 文件夹,里面有 .txt 文件。
ART2019Path <- file.path(GTFS_path, "2019-10 Arlington.zip")
ART2019GTFS <- read_gtfs(ART2019Path)
这是数据:https://realtime.commuterpage.com/rtt/public/utility/gtfs.aspx
数据加载正常,但自动读取为所有字符。为了我的数据分析目的,我需要大部分数据是数字的。例如,显示交通几何:
trip_geom <- get_trip_geometry(ART2019GTFS, file = "shapes")
plot(trip_geom$geometry)
我尝试改变所有数据,假设没有数字的数据将保留为字符,但它不起作用:
ART2019GTFS <- mutate_all(ART2019GTFS, funs(as.numeric))
我对 R 比较陌生,所以不知道如何解决这个问题。
如果能帮助解决这个问题,我们将不胜感激。
当我按照 link 进行操作时,我得到一个名为 google_transit.zip
的 zip 文件,其中包含几个逗号分隔的文本文件。当我运行这个时:
ART2019GTFS <- read_gtfs("~/google_transit.zip")
我明白了(每个文本文件一个数据帧):
> str(ART2019GTFS)
List of 8
$ agency :Classes ‘data.table’ and 'data.frame': 1 obs. of 6 variables:
..$ agency_id : chr "1"
..$ agency_name : chr "Arlington Transit"
..$ agency_url : chr "http://www.arlingtontransit.com"
..$ agency_phone : chr "703-228-7433"
..$ agency_timezone: chr "America/New_York"
..$ agency_lang : chr "en"
..- attr(*, ".internal.selfref")=<externalptr>
$ calendar :Classes ‘data.table’ and 'data.frame': 5 obs. of 10 variables:
..$ service_id: chr [1:5] "1" "2" "3" "4" ...
..$ monday : int [1:5] 1 0 1 0 0
..$ tuesday : int [1:5] 1 0 1 0 0
..$ wednesday : int [1:5] 1 0 1 0 0
..$ thursday : int [1:5] 1 0 1 0 0
..$ friday : int [1:5] 0 1 1 0 0
..$ saturday : int [1:5] 0 0 0 1 0
..$ sunday : int [1:5] 0 0 0 0 1
..$ start_date: Date[1:5], format: "2022-03-27" "2022-03-27" "2022-03-27" ...
..$ end_date : Date[1:5], format: "2023-12-31" "2023-12-31" "2023-12-31" ...
..- attr(*, ".internal.selfref")=<externalptr>
$ calendar_dates:Classes ‘data.table’ and 'data.frame': 3 obs. of 3 variables:
..$ service_id : chr [1:3] "1" "3" "5"
..$ date : Date[1:3], format: "2022-05-30" "2022-05-30" "2022-05-30"
..$ exception_type: int [1:3] 2 2 1
..- attr(*, ".internal.selfref")=<externalptr>
$ routes :Classes ‘data.table’ and 'data.frame': 21 obs. of 8 variables:
..$ route_id : chr [1:21] "41" "42" "43" "45" ...
..$ agency_id : chr [1:21] "1" "1" "1" "1" ...
..$ route_short_name: chr [1:21] "41" "42" "43" "45" ...
..$ route_long_name : chr [1:21] "Columbia Pike-Ballston-Court House" "Ballston-Pentagon" "Crystal City-Courthouse" "Columbia Pike-DHS/Sequoia-Rosslyn" ...
..$ route_type : int [1:21] 3 3 3 3 3 3 3 3 3 3 ...
..$ route_color : chr [1:21] "DCC154" "D7171F" "BC1B8D" "0084CA" ...
..$ route_text_color: chr [1:21] "FFFFFF" "FFFFFF" "FFFFFF" "FFFFFF" ...
..$ route_url : chr [1:21] "https://www.arlingtontransit.com/routes-schedules/art-41/" "https://www.arlingtontransit.com/routes-schedules/art-42/" "https://www.arlingtontransit.com/routes-schedules/art-43/" "https://www.arlingtontransit.com/routes-schedules/art-45/" ...
..- attr(*, ".internal.selfref")=<externalptr>
$ shapes :Classes ‘data.table’ and 'data.frame': 10721 obs. of 4 variables:
..$ shape_id : chr [1:10721] "9" "9" "9" "9" ...
..$ shape_pt_lon : num [1:10721] -77.1 -77.1 -77.1 -77.1 -77.1 ...
..$ shape_pt_lat : num [1:10721] 38.9 38.9 38.9 38.9 38.9 ...
..$ shape_pt_sequence: int [1:10721] 1 2 3 4 5 6 7 8 9 10 ...
..- attr(*, ".internal.selfref")=<externalptr>
$ stop_times :Classes ‘data.table’ and 'data.frame': 57711 obs. of 7 variables:
..$ trip_id : chr [1:57711] "1" "1" "1" "1" ...
..$ arrival_time : chr [1:57711] "10:25:00" "10:27:25" "10:28:53" "10:30:00" ...
..$ departure_time: chr [1:57711] "10:25:00" "10:27:25" "10:28:53" "10:30:00" ...
..$ stop_id : chr [1:57711] "138" "141" "867" "144" ...
..$ stop_sequence : int [1:57711] 1 2 3 4 5 6 7 8 9 10 ...
..$ stop_headsign : chr [1:57711] "" "" "" "" ...
..$ timepoint : int [1:57711] 1 0 0 1 0 0 0 0 0 0 ...
..- attr(*, ".internal.selfref")=<externalptr>
$ stops :Classes ‘data.table’ and 'data.frame': 640 obs. of 6 variables:
..$ stop_id : chr [1:640] "83" "85" "87" "89" ...
..$ stop_code: chr [1:640] "51001" "51003" "51005" "51007" ...
..$ stop_name: chr [1:640] "Ballston Metro G, Fairfax Dr, EB @ N Stafford, NS" "Fairfax Drive, WB @ N Utah Street, FS" "16th Street N, WB @ N Glebe Road, FS" "16th Street N, WB @ N Buchanan Street, NS" ...
..$ stop_lat : num [1:640] 38.9 38.9 38.9 38.9 38.9 ...
..$ stop_lon : num [1:640] -77.1 -77.1 -77.1 -77.1 -77.1 ...
..$ stop_url : chr [1:640] "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51001#realTimeResultsContainer" "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51003#realTimeResultsContainer" "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51005#realTimeResultsContainer" "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51007#realTimeResultsContainer" ...
..- attr(*, ".internal.selfref")=<externalptr>
$ trips :Classes ‘data.table’ and 'data.frame': 2296 obs. of 7 variables:
..$ route_id : chr [1:2296] "52" "52" "52" "52" ...
..$ service_id : chr [1:2296] "3" "3" "3" "3" ...
..$ trip_id : chr [1:2296] "1" "2" "3" "4" ...
..$ trip_headsign: chr [1:2296] "Ballston Metro" "Ballston Metro" "Ballston Metro" "Ballston Metro" ...
..$ direction_id : int [1:2296] 0 0 0 0 0 1 1 1 1 1 ...
..$ block_id : chr [1:2296] "5202" "5202" "5202" "5202" ...
..$ shape_id : chr [1:2296] "76" "76" "76" "76" ...
..- attr(*, ".internal.selfref")=<externalptr>
- attr(*, "class")= chr [1:3] "dt_gtfs" "gtfs" "list"
然后这显然成功了:
> trip_geom <- get_trip_geometry(ART2019GTFS, file = "shapes")
> str(trip_geom)
Classes ‘sf’, ‘data.table’ and 'data.frame': 2296 obs. of 3 variables:
$ trip_id : chr "1" "2" "3" "4" ...
$ origin_file: chr "shapes" "shapes" "shapes" "shapes" ...
$ geometry :sfc_LINESTRING of length 2296; first list element: 'XY' num [1:131, 1:2] -77.2 -77.2 -77.2 -77.2 -77.2 ...
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
..- attr(*, "names")= chr [1:2] "trip_id" "origin_file"
我正在关注 gtfstools (https://cran.r-project.org/web/packages/gtfstools/vignettes/gtfstools.html) 的小插图,但我被数据格式卡住了。基本上,我链接到一个 gtfs 数据集,它是一个 zip 文件夹,里面有 .txt 文件。
ART2019Path <- file.path(GTFS_path, "2019-10 Arlington.zip")
ART2019GTFS <- read_gtfs(ART2019Path)
这是数据:https://realtime.commuterpage.com/rtt/public/utility/gtfs.aspx
数据加载正常,但自动读取为所有字符。为了我的数据分析目的,我需要大部分数据是数字的。例如,显示交通几何:
trip_geom <- get_trip_geometry(ART2019GTFS, file = "shapes")
plot(trip_geom$geometry)
我尝试改变所有数据,假设没有数字的数据将保留为字符,但它不起作用:
ART2019GTFS <- mutate_all(ART2019GTFS, funs(as.numeric))
我对 R 比较陌生,所以不知道如何解决这个问题。
如果能帮助解决这个问题,我们将不胜感激。
当我按照 link 进行操作时,我得到一个名为 google_transit.zip
的 zip 文件,其中包含几个逗号分隔的文本文件。当我运行这个时:
ART2019GTFS <- read_gtfs("~/google_transit.zip")
我明白了(每个文本文件一个数据帧):
> str(ART2019GTFS)
List of 8
$ agency :Classes ‘data.table’ and 'data.frame': 1 obs. of 6 variables:
..$ agency_id : chr "1"
..$ agency_name : chr "Arlington Transit"
..$ agency_url : chr "http://www.arlingtontransit.com"
..$ agency_phone : chr "703-228-7433"
..$ agency_timezone: chr "America/New_York"
..$ agency_lang : chr "en"
..- attr(*, ".internal.selfref")=<externalptr>
$ calendar :Classes ‘data.table’ and 'data.frame': 5 obs. of 10 variables:
..$ service_id: chr [1:5] "1" "2" "3" "4" ...
..$ monday : int [1:5] 1 0 1 0 0
..$ tuesday : int [1:5] 1 0 1 0 0
..$ wednesday : int [1:5] 1 0 1 0 0
..$ thursday : int [1:5] 1 0 1 0 0
..$ friday : int [1:5] 0 1 1 0 0
..$ saturday : int [1:5] 0 0 0 1 0
..$ sunday : int [1:5] 0 0 0 0 1
..$ start_date: Date[1:5], format: "2022-03-27" "2022-03-27" "2022-03-27" ...
..$ end_date : Date[1:5], format: "2023-12-31" "2023-12-31" "2023-12-31" ...
..- attr(*, ".internal.selfref")=<externalptr>
$ calendar_dates:Classes ‘data.table’ and 'data.frame': 3 obs. of 3 variables:
..$ service_id : chr [1:3] "1" "3" "5"
..$ date : Date[1:3], format: "2022-05-30" "2022-05-30" "2022-05-30"
..$ exception_type: int [1:3] 2 2 1
..- attr(*, ".internal.selfref")=<externalptr>
$ routes :Classes ‘data.table’ and 'data.frame': 21 obs. of 8 variables:
..$ route_id : chr [1:21] "41" "42" "43" "45" ...
..$ agency_id : chr [1:21] "1" "1" "1" "1" ...
..$ route_short_name: chr [1:21] "41" "42" "43" "45" ...
..$ route_long_name : chr [1:21] "Columbia Pike-Ballston-Court House" "Ballston-Pentagon" "Crystal City-Courthouse" "Columbia Pike-DHS/Sequoia-Rosslyn" ...
..$ route_type : int [1:21] 3 3 3 3 3 3 3 3 3 3 ...
..$ route_color : chr [1:21] "DCC154" "D7171F" "BC1B8D" "0084CA" ...
..$ route_text_color: chr [1:21] "FFFFFF" "FFFFFF" "FFFFFF" "FFFFFF" ...
..$ route_url : chr [1:21] "https://www.arlingtontransit.com/routes-schedules/art-41/" "https://www.arlingtontransit.com/routes-schedules/art-42/" "https://www.arlingtontransit.com/routes-schedules/art-43/" "https://www.arlingtontransit.com/routes-schedules/art-45/" ...
..- attr(*, ".internal.selfref")=<externalptr>
$ shapes :Classes ‘data.table’ and 'data.frame': 10721 obs. of 4 variables:
..$ shape_id : chr [1:10721] "9" "9" "9" "9" ...
..$ shape_pt_lon : num [1:10721] -77.1 -77.1 -77.1 -77.1 -77.1 ...
..$ shape_pt_lat : num [1:10721] 38.9 38.9 38.9 38.9 38.9 ...
..$ shape_pt_sequence: int [1:10721] 1 2 3 4 5 6 7 8 9 10 ...
..- attr(*, ".internal.selfref")=<externalptr>
$ stop_times :Classes ‘data.table’ and 'data.frame': 57711 obs. of 7 variables:
..$ trip_id : chr [1:57711] "1" "1" "1" "1" ...
..$ arrival_time : chr [1:57711] "10:25:00" "10:27:25" "10:28:53" "10:30:00" ...
..$ departure_time: chr [1:57711] "10:25:00" "10:27:25" "10:28:53" "10:30:00" ...
..$ stop_id : chr [1:57711] "138" "141" "867" "144" ...
..$ stop_sequence : int [1:57711] 1 2 3 4 5 6 7 8 9 10 ...
..$ stop_headsign : chr [1:57711] "" "" "" "" ...
..$ timepoint : int [1:57711] 1 0 0 1 0 0 0 0 0 0 ...
..- attr(*, ".internal.selfref")=<externalptr>
$ stops :Classes ‘data.table’ and 'data.frame': 640 obs. of 6 variables:
..$ stop_id : chr [1:640] "83" "85" "87" "89" ...
..$ stop_code: chr [1:640] "51001" "51003" "51005" "51007" ...
..$ stop_name: chr [1:640] "Ballston Metro G, Fairfax Dr, EB @ N Stafford, NS" "Fairfax Drive, WB @ N Utah Street, FS" "16th Street N, WB @ N Glebe Road, FS" "16th Street N, WB @ N Buchanan Street, NS" ...
..$ stop_lat : num [1:640] 38.9 38.9 38.9 38.9 38.9 ...
..$ stop_lon : num [1:640] -77.1 -77.1 -77.1 -77.1 -77.1 ...
..$ stop_url : chr [1:640] "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51001#realTimeResultsContainer" "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51003#realTimeResultsContainer" "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51005#realTimeResultsContainer" "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51007#realTimeResultsContainer" ...
..- attr(*, ".internal.selfref")=<externalptr>
$ trips :Classes ‘data.table’ and 'data.frame': 2296 obs. of 7 variables:
..$ route_id : chr [1:2296] "52" "52" "52" "52" ...
..$ service_id : chr [1:2296] "3" "3" "3" "3" ...
..$ trip_id : chr [1:2296] "1" "2" "3" "4" ...
..$ trip_headsign: chr [1:2296] "Ballston Metro" "Ballston Metro" "Ballston Metro" "Ballston Metro" ...
..$ direction_id : int [1:2296] 0 0 0 0 0 1 1 1 1 1 ...
..$ block_id : chr [1:2296] "5202" "5202" "5202" "5202" ...
..$ shape_id : chr [1:2296] "76" "76" "76" "76" ...
..- attr(*, ".internal.selfref")=<externalptr>
- attr(*, "class")= chr [1:3] "dt_gtfs" "gtfs" "list"
然后这显然成功了:
> trip_geom <- get_trip_geometry(ART2019GTFS, file = "shapes")
> str(trip_geom)
Classes ‘sf’, ‘data.table’ and 'data.frame': 2296 obs. of 3 variables:
$ trip_id : chr "1" "2" "3" "4" ...
$ origin_file: chr "shapes" "shapes" "shapes" "shapes" ...
$ geometry :sfc_LINESTRING of length 2296; first list element: 'XY' num [1:131, 1:2] -77.2 -77.2 -77.2 -77.2 -77.2 ...
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
..- attr(*, "names")= chr [1:2] "trip_id" "origin_file"