使用 R 将 GTFS 空间数据从字符转换为数字

Using R to convert GTFS spatial data from character to numeric

我正在关注 gtfstools (https://cran.r-project.org/web/packages/gtfstools/vignettes/gtfstools.html) 的小插图,但我被数据格式卡住了。基本上,我链接到一个 gtfs 数据集,它是一个 zip 文件夹,里面有 .txt 文件。

ART2019Path <- file.path(GTFS_path, "2019-10 Arlington.zip")
ART2019GTFS <- read_gtfs(ART2019Path) 

这是数据:https://realtime.commuterpage.com/rtt/public/utility/gtfs.aspx

数据加载正常,但自动读取为所有字符。为了我的数据分析目的,我需要大部分数据是数字的。例如,显示交通几何:

trip_geom <- get_trip_geometry(ART2019GTFS, file = "shapes")
plot(trip_geom$geometry)

我尝试改变所有数据,假设没有数字的数据将保留为字符,但它不起作用:

ART2019GTFS <- mutate_all(ART2019GTFS, funs(as.numeric))

我对 R 比较陌生,所以不知道如何解决这个问题。

如果能帮助解决这个问题,我们将不胜感激。

当我按照 link 进行操作时,我得到一个名为 google_transit.zip 的 zip 文件,其中包含几个逗号分隔的文本文件。当我运行这个时:

ART2019GTFS <- read_gtfs("~/google_transit.zip") 

我明白了(每个文本文件一个数据帧):

> str(ART2019GTFS)
List of 8
 $ agency        :Classes ‘data.table’ and 'data.frame':    1 obs. of  6 variables:
  ..$ agency_id      : chr "1"
  ..$ agency_name    : chr "Arlington Transit"
  ..$ agency_url     : chr "http://www.arlingtontransit.com"
  ..$ agency_phone   : chr "703-228-7433"
  ..$ agency_timezone: chr "America/New_York"
  ..$ agency_lang    : chr "en"
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ calendar      :Classes ‘data.table’ and 'data.frame':    5 obs. of  10 variables:
  ..$ service_id: chr [1:5] "1" "2" "3" "4" ...
  ..$ monday    : int [1:5] 1 0 1 0 0
  ..$ tuesday   : int [1:5] 1 0 1 0 0
  ..$ wednesday : int [1:5] 1 0 1 0 0
  ..$ thursday  : int [1:5] 1 0 1 0 0
  ..$ friday    : int [1:5] 0 1 1 0 0
  ..$ saturday  : int [1:5] 0 0 0 1 0
  ..$ sunday    : int [1:5] 0 0 0 0 1
  ..$ start_date: Date[1:5], format: "2022-03-27" "2022-03-27" "2022-03-27" ...
  ..$ end_date  : Date[1:5], format: "2023-12-31" "2023-12-31" "2023-12-31" ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ calendar_dates:Classes ‘data.table’ and 'data.frame':    3 obs. of  3 variables:
  ..$ service_id    : chr [1:3] "1" "3" "5"
  ..$ date          : Date[1:3], format: "2022-05-30" "2022-05-30" "2022-05-30"
  ..$ exception_type: int [1:3] 2 2 1
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ routes        :Classes ‘data.table’ and 'data.frame':    21 obs. of  8 variables:
  ..$ route_id        : chr [1:21] "41" "42" "43" "45" ...
  ..$ agency_id       : chr [1:21] "1" "1" "1" "1" ...
  ..$ route_short_name: chr [1:21] "41" "42" "43" "45" ...
  ..$ route_long_name : chr [1:21] "Columbia Pike-Ballston-Court House" "Ballston-Pentagon" "Crystal City-Courthouse" "Columbia Pike-DHS/Sequoia-Rosslyn" ...
  ..$ route_type      : int [1:21] 3 3 3 3 3 3 3 3 3 3 ...
  ..$ route_color     : chr [1:21] "DCC154" "D7171F" "BC1B8D" "0084CA" ...
  ..$ route_text_color: chr [1:21] "FFFFFF" "FFFFFF" "FFFFFF" "FFFFFF" ...
  ..$ route_url       : chr [1:21] "https://www.arlingtontransit.com/routes-schedules/art-41/" "https://www.arlingtontransit.com/routes-schedules/art-42/" "https://www.arlingtontransit.com/routes-schedules/art-43/" "https://www.arlingtontransit.com/routes-schedules/art-45/" ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ shapes        :Classes ‘data.table’ and 'data.frame':    10721 obs. of  4 variables:
  ..$ shape_id         : chr [1:10721] "9" "9" "9" "9" ...
  ..$ shape_pt_lon     : num [1:10721] -77.1 -77.1 -77.1 -77.1 -77.1 ...
  ..$ shape_pt_lat     : num [1:10721] 38.9 38.9 38.9 38.9 38.9 ...
  ..$ shape_pt_sequence: int [1:10721] 1 2 3 4 5 6 7 8 9 10 ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ stop_times    :Classes ‘data.table’ and 'data.frame':    57711 obs. of  7 variables:
  ..$ trip_id       : chr [1:57711] "1" "1" "1" "1" ...
  ..$ arrival_time  : chr [1:57711] "10:25:00" "10:27:25" "10:28:53" "10:30:00" ...
  ..$ departure_time: chr [1:57711] "10:25:00" "10:27:25" "10:28:53" "10:30:00" ...
  ..$ stop_id       : chr [1:57711] "138" "141" "867" "144" ...
  ..$ stop_sequence : int [1:57711] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ stop_headsign : chr [1:57711] "" "" "" "" ...
  ..$ timepoint     : int [1:57711] 1 0 0 1 0 0 0 0 0 0 ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ stops         :Classes ‘data.table’ and 'data.frame':    640 obs. of  6 variables:
  ..$ stop_id  : chr [1:640] "83" "85" "87" "89" ...
  ..$ stop_code: chr [1:640] "51001" "51003" "51005" "51007" ...
  ..$ stop_name: chr [1:640] "Ballston Metro G, Fairfax Dr, EB @ N Stafford, NS" "Fairfax Drive, WB @ N Utah Street, FS" "16th Street N, WB @ N Glebe Road, FS" "16th Street N, WB @ N Buchanan Street, NS" ...
  ..$ stop_lat : num [1:640] 38.9 38.9 38.9 38.9 38.9 ...
  ..$ stop_lon : num [1:640] -77.1 -77.1 -77.1 -77.1 -77.1 ...
  ..$ stop_url : chr [1:640] "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51001#realTimeResultsContainer" "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51003#realTimeResultsContainer" "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51005#realTimeResultsContainer" "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51007#realTimeResultsContainer" ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ trips         :Classes ‘data.table’ and 'data.frame':    2296 obs. of  7 variables:
  ..$ route_id     : chr [1:2296] "52" "52" "52" "52" ...
  ..$ service_id   : chr [1:2296] "3" "3" "3" "3" ...
  ..$ trip_id      : chr [1:2296] "1" "2" "3" "4" ...
  ..$ trip_headsign: chr [1:2296] "Ballston Metro" "Ballston Metro" "Ballston Metro" "Ballston Metro" ...
  ..$ direction_id : int [1:2296] 0 0 0 0 0 1 1 1 1 1 ...
  ..$ block_id     : chr [1:2296] "5202" "5202" "5202" "5202" ...
  ..$ shape_id     : chr [1:2296] "76" "76" "76" "76" ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, "class")= chr [1:3] "dt_gtfs" "gtfs" "list"

然后这显然成功了:

> trip_geom <- get_trip_geometry(ART2019GTFS, file = "shapes")
> str(trip_geom)
Classes ‘sf’, ‘data.table’ and 'data.frame':    2296 obs. of  3 variables:
 $ trip_id    : chr  "1" "2" "3" "4" ...
 $ origin_file: chr  "shapes" "shapes" "shapes" "shapes" ...
 $ geometry   :sfc_LINESTRING of length 2296; first list element:  'XY' num [1:131, 1:2] -77.2 -77.2 -77.2 -77.2 -77.2 ...
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
  ..- attr(*, "names")= chr [1:2] "trip_id" "origin_file"