使用 gtfstools 读取 UTF-8 文件的问题
Problem for reading UTF-8 file with gtfstools
我正在尝试打开一个采用 UTF-8 编码的 GTFS 文件,但即使我将项目在 R 中的编码更改为 UTF-8,字符仍然被截断。该问题可以在“stop_name”列中看到。我正在使用 windows 10,我知道 R 存在一些编码问题,但我不知道它是什么。
可重现的例子:
install.packages('gtfstools')
library(gtfstools)
# GTFS file directory
data_path <- system.file("extdata", package = "gtfstools")
spo_path <- file.path(data_path, "spo_gtfs.zip")
# read the file
spo_gtfs <- read_gtfs(spo_path)
# Show the stops (problem with encoding)
head(spo_gtfs$stops)
输出:
Session 信息:
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=Portuguese_Brazil.1252 LC_CTYPE=Portuguese_Brazil.1252 LC_MONETARY=Portuguese_Brazil.1252
[4] LC_NUMERIC=C LC_TIME=Portuguese_Brazil.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.1.2 tools_4.1.2
您只需要在 read_gtfs()
上使用 encoding
参数:
library(gtfstools)
# GTFS file directory
data_path <- system.file("extdata", package = "gtfstools")
spo_path <- file.path(data_path, "spo_gtfs.zip")
# read the file
spo_gtfs <- read_gtfs(spo_path, encoding = "UTF-8")
# Show the stops (problem with encoding)
head(spo_gtfs$stops)
#> stop_id stop_name stop_desc stop_lat stop_lon
#> 1: 18848 Clínicas -23.55402 -46.67111
#> 2: 18849 Vila Madalena -23.54650 -46.69114
#> 3: 18850 Consolação -23.55809 -46.66020
#> 4: 18851 Conceição -23.63504 -46.64124
#> 5: 18852 Jabaquara -23.64600 -46.64103
#> 6: 18853 São Judas -23.62588 -46.64094
我正在尝试打开一个采用 UTF-8 编码的 GTFS 文件,但即使我将项目在 R 中的编码更改为 UTF-8,字符仍然被截断。该问题可以在“stop_name”列中看到。我正在使用 windows 10,我知道 R 存在一些编码问题,但我不知道它是什么。
可重现的例子:
install.packages('gtfstools')
library(gtfstools)
# GTFS file directory
data_path <- system.file("extdata", package = "gtfstools")
spo_path <- file.path(data_path, "spo_gtfs.zip")
# read the file
spo_gtfs <- read_gtfs(spo_path)
# Show the stops (problem with encoding)
head(spo_gtfs$stops)
输出:
Session 信息:
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=Portuguese_Brazil.1252 LC_CTYPE=Portuguese_Brazil.1252 LC_MONETARY=Portuguese_Brazil.1252
[4] LC_NUMERIC=C LC_TIME=Portuguese_Brazil.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.1.2 tools_4.1.2
您只需要在 read_gtfs()
上使用 encoding
参数:
library(gtfstools)
# GTFS file directory
data_path <- system.file("extdata", package = "gtfstools")
spo_path <- file.path(data_path, "spo_gtfs.zip")
# read the file
spo_gtfs <- read_gtfs(spo_path, encoding = "UTF-8")
# Show the stops (problem with encoding)
head(spo_gtfs$stops)
#> stop_id stop_name stop_desc stop_lat stop_lon
#> 1: 18848 Clínicas -23.55402 -46.67111
#> 2: 18849 Vila Madalena -23.54650 -46.69114
#> 3: 18850 Consolação -23.55809 -46.66020
#> 4: 18851 Conceição -23.63504 -46.64124
#> 5: 18852 Jabaquara -23.64600 -46.64103
#> 6: 18853 São Judas -23.62588 -46.64094