破解以在 haven::read_sav() 中的文件路径中包含特殊字符
Hack to to include special characters in file path in haven::read_sav()
haven (1.1.1) 包 seems to be an issue 当在文件路径中包含任何类型的特殊字符时,仅包括文件名。
假设这是一个真正的问题,我正在寻找某种巧妙的方法 hack/solution 来解决它。
一个(不理想的)示例是让 R 将文件的副本复制到更友好的路径中并给它一个 "better" 文件名,然后使用 haven 加载。如:
setwd("c:/temp")
fn <- "randóóm.sav"
file.copy(paste0("./äglæpath/", fn), fn)
file.rename(fn, gsub("[^-\./a-zA-Z0-9[:space:]]", "", fn))
# now apply read_sav() to the copy
我正在使用:
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
不幸的是,我已经能够在 Windows 10 上使用 haven
的标准版本和 haven
的 devtools
版本重现该问题。这似乎是避风港的已知错误。 #371
推荐解决方法:
将文件移动到文件路径或文件名中没有德语变音符号的目录。因此,您的变通办法按规定工作。
> file.path(dataFilepath, dtaFilename)
[1] "äglæpath/randóóm.dta"
> dtaFilename <- gsub("[^-\./a-zA-Z0-9[:space:]]", "", dtaFilename)
> bdatFilename <- gsub("[^-\./a-zA-Z0-9[:space:]]", "", bdatFilename)
> savFilename <- gsub("[^-\./a-zA-Z0-9[:space:]]", "", savFilename)
> dataFilepath <- gsub("[^-\./a-zA-Z0-9[:space:]]", "", dataFilepath)
> file.path(dataFilepath, dtaFilename)
[1] "glpath/randm.dta"
> # Stata
> read_dta(dtaDest)
# A tibble: 150 x 5
sepallength sepalwidth petallength petalwidth species
<dbl> <dbl> <dbl> <dbl> <chr>
1 5.10 3.5 1.40 0.200 setosa
2 4.90 3 1.40 0.200 setosa
3 4.70 3.20 1.30 0.200 setosa
4 4.60 3.10 1.5 0.200 setosa
5 5 3.60 1.40 0.200 setosa
6 5.40 3.90 1.70 0.400 setosa
7 4.60 3.40 1.40 0.300 setosa
8 5 3.40 1.5 0.200 setosa
9 4.40 2.90 1.40 0.200 setosa
10 4.90 3.10 1.5 0.100 setosa
# ... with 140 more rows
>
Github错误 #371
Read_*() 不适用于文件路径中的特殊字符 #371
https://github.com/tidyverse/haven/issues/371
问题代码在haven/src/DFReader.cpp
中的DfReader.cpp df.parse_dta() 594-612。
重现代码
require(haven)
require(stringi)
dtaURL <- "https://github.com/tidyverse/haven/blob/master/inst/examples/iris.dta?raw=true"
bdatURL <- "https://github.com/tidyverse/haven/blob/master/inst/examples/iris.sas7bdat?raw=true"
savURL <- "https://github.com/tidyverse/haven/blob/master/inst/examples/iris.sav?raw=true"
dtaFilename <- "randóóm.dta"
bdatFilename <- "randóóm.bdata"
savFilename <- "randóóm.sav"
dataFilepath <- "äglæpath"
if (!dir.exists(dataFilepath)) {
dir.create(file.path(dataFilepath), showWarnings = TRUE)
}
dtaDest = file.path(dataFilepath, dtaFilename)
bdatDest = file.path(dataFilepath, bdatFilename )
savDest = file.path(dataFilepath, savFilename )
download.file(dtaURL, destfile = dtaDest, method = "wget", mode = "wb")
download.file(bdatURL, destfile = bdatDest, method = "wget", mode = "wb")
download.file(savURL, destfile = savDest, method = "wget", mode = "wb")
# Stata
read_dta(dtaDest)
# SAS
read_sas(bdatDest)
# SPSS
read_sav(savDest)
控制台输出
> require(haven)
> require(stringi)
> dtaURL <- "https://github.com/tidyverse/haven/blob/master/inst/examples/iris.dta?raw=true"
> bdatURL <- "https://github.com/tidyverse/haven/blob/master/inst/examples/iris.sas7bdat?raw=true"
> savURL <- "https://github.com/tidyverse/haven/blob/master/inst/examples/iris.sav?raw=true"
> dtaFilename <- "randóóm.dta"
> bdatFilename <- "randóóm.bdata"
> savFilename <- "randóóm.sav"
> dataFilepath <- "äglæpath"
> if (!dir.exists(dataFilepath)) {
+ dir.create(file.path(dataFilepath), showWarnings = TRUE)
+ }
> dtaDest = file.path(dataFilepath, dtaFilename)
> bdatDest = file.path(dataFilepath, bdatFilename )
> savDest = file.path(dataFilepath, savFilename )
> download.file(dtaURL, destfile = dtaDest, method = "wget", mode = "wb")
--2018-05-29 15:56:59-- https://github.com/tidyverse/haven/blob/master/inst/examples/iris.dta?raw=true
Resolving github.com (github.com)... 192.30.255.113, 192.30.255.112
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/tidyverse/haven/raw/master/inst/examples/iris.dta [following]
--2018-05-29 15:56:59-- https://github.com/tidyverse/haven/raw/master/inst/examples/iris.dta
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/tidyverse/haven/master/inst/examples/iris.dta [following]
--2018-05-29 15:56:59-- https://raw.githubusercontent.com/tidyverse/haven/master/inst/examples/iris.dta
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.52.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.52.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8213 (8.0K) [application/octet-stream]
Saving to: '4gl6path/rand33m.dta'
0K ........ 100% 1.56M=0.005s
2018-05-29 15:56:59 (1.56 MB/s) - '4gl6path/rand33m.dta' saved [8213/8213]
> download.file(bdatURL, destfile = bdatDest, method = "wget", mode = "wb")
--2018-05-29 15:56:59-- https://github.com/tidyverse/haven/blob/master/inst/examples/iris.sas7bdat?raw=true
Resolving github.com (github.com)... 192.30.255.113, 192.30.255.112
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/tidyverse/haven/raw/master/inst/examples/iris.sas7bdat [following]
--2018-05-29 15:56:59-- https://github.com/tidyverse/haven/raw/master/inst/examples/iris.sas7bdat
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/tidyverse/haven/master/inst/examples/iris.sas7bdat [following]
--2018-05-29 15:56:59-- https://raw.githubusercontent.com/tidyverse/haven/master/inst/examples/iris.sas7bdat
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.52.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.52.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 131072 (128K) [application/octet-stream]
Saving to: '4gl6path/rand33m.bdata'
0K .......... .......... .......... .......... .......... 39% 4.05M 0s
50K .......... .......... .......... .......... .......... 78% 19.7M 0s
100K .......... .......... ........ 100% 19.3M=0.02s
2018-05-29 15:57:00 (7.83 MB/s) - '4gl6path/rand33m.bdata' saved [131072/131072]
> download.file(savURL, destfile = savDest, method = "wget", mode = "wb")
--2018-05-29 15:57:01-- https://github.com/tidyverse/haven/blob/master/inst/examples/iris.sav?raw=true
Resolving github.com (github.com)... 192.30.255.113, 192.30.255.112
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/tidyverse/haven/raw/master/inst/examples/iris.sav [following]
--2018-05-29 15:57:01-- https://github.com/tidyverse/haven/raw/master/inst/examples/iris.sav
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/tidyverse/haven/master/inst/examples/iris.sav [following]
--2018-05-29 15:57:01-- https://raw.githubusercontent.com/tidyverse/haven/master/inst/examples/iris.sav
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.52.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.52.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6690 (6.5K) [application/octet-stream]
Saving to: '4gl6path/rand33m.sav'
0K ...... 100% 3.09M=0.002s
2018-05-29 15:57:01 (3.09 MB/s) - '4gl6path/rand33m.sav' saved [6690/6690]
> # Stata
> read_dta(dtaDest)
Error in df_parse_dta_file(spec, encoding) :
Failed to parse <...>/äglæpath/randóóm.dta: Unable to open file.
haven (1.1.1) 包 seems to be an issue 当在文件路径中包含任何类型的特殊字符时,仅包括文件名。
假设这是一个真正的问题,我正在寻找某种巧妙的方法 hack/solution 来解决它。
一个(不理想的)示例是让 R 将文件的副本复制到更友好的路径中并给它一个 "better" 文件名,然后使用 haven 加载。如:
setwd("c:/temp")
fn <- "randóóm.sav"
file.copy(paste0("./äglæpath/", fn), fn)
file.rename(fn, gsub("[^-\./a-zA-Z0-9[:space:]]", "", fn))
# now apply read_sav() to the copy
我正在使用:
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
不幸的是,我已经能够在 Windows 10 上使用 haven
的标准版本和 haven
的 devtools
版本重现该问题。这似乎是避风港的已知错误。 #371
推荐解决方法:
将文件移动到文件路径或文件名中没有德语变音符号的目录。因此,您的变通办法按规定工作。
> file.path(dataFilepath, dtaFilename)
[1] "äglæpath/randóóm.dta"
> dtaFilename <- gsub("[^-\./a-zA-Z0-9[:space:]]", "", dtaFilename)
> bdatFilename <- gsub("[^-\./a-zA-Z0-9[:space:]]", "", bdatFilename)
> savFilename <- gsub("[^-\./a-zA-Z0-9[:space:]]", "", savFilename)
> dataFilepath <- gsub("[^-\./a-zA-Z0-9[:space:]]", "", dataFilepath)
> file.path(dataFilepath, dtaFilename)
[1] "glpath/randm.dta"
> # Stata
> read_dta(dtaDest)
# A tibble: 150 x 5
sepallength sepalwidth petallength petalwidth species
<dbl> <dbl> <dbl> <dbl> <chr>
1 5.10 3.5 1.40 0.200 setosa
2 4.90 3 1.40 0.200 setosa
3 4.70 3.20 1.30 0.200 setosa
4 4.60 3.10 1.5 0.200 setosa
5 5 3.60 1.40 0.200 setosa
6 5.40 3.90 1.70 0.400 setosa
7 4.60 3.40 1.40 0.300 setosa
8 5 3.40 1.5 0.200 setosa
9 4.40 2.90 1.40 0.200 setosa
10 4.90 3.10 1.5 0.100 setosa
# ... with 140 more rows
>
Github错误 #371
Read_*() 不适用于文件路径中的特殊字符 #371 https://github.com/tidyverse/haven/issues/371
问题代码在haven/src/DFReader.cpp
中的DfReader.cpp df.parse_dta() 594-612。
重现代码
require(haven)
require(stringi)
dtaURL <- "https://github.com/tidyverse/haven/blob/master/inst/examples/iris.dta?raw=true"
bdatURL <- "https://github.com/tidyverse/haven/blob/master/inst/examples/iris.sas7bdat?raw=true"
savURL <- "https://github.com/tidyverse/haven/blob/master/inst/examples/iris.sav?raw=true"
dtaFilename <- "randóóm.dta"
bdatFilename <- "randóóm.bdata"
savFilename <- "randóóm.sav"
dataFilepath <- "äglæpath"
if (!dir.exists(dataFilepath)) {
dir.create(file.path(dataFilepath), showWarnings = TRUE)
}
dtaDest = file.path(dataFilepath, dtaFilename)
bdatDest = file.path(dataFilepath, bdatFilename )
savDest = file.path(dataFilepath, savFilename )
download.file(dtaURL, destfile = dtaDest, method = "wget", mode = "wb")
download.file(bdatURL, destfile = bdatDest, method = "wget", mode = "wb")
download.file(savURL, destfile = savDest, method = "wget", mode = "wb")
# Stata
read_dta(dtaDest)
# SAS
read_sas(bdatDest)
# SPSS
read_sav(savDest)
控制台输出
> require(haven)
> require(stringi)
> dtaURL <- "https://github.com/tidyverse/haven/blob/master/inst/examples/iris.dta?raw=true"
> bdatURL <- "https://github.com/tidyverse/haven/blob/master/inst/examples/iris.sas7bdat?raw=true"
> savURL <- "https://github.com/tidyverse/haven/blob/master/inst/examples/iris.sav?raw=true"
> dtaFilename <- "randóóm.dta"
> bdatFilename <- "randóóm.bdata"
> savFilename <- "randóóm.sav"
> dataFilepath <- "äglæpath"
> if (!dir.exists(dataFilepath)) {
+ dir.create(file.path(dataFilepath), showWarnings = TRUE)
+ }
> dtaDest = file.path(dataFilepath, dtaFilename)
> bdatDest = file.path(dataFilepath, bdatFilename )
> savDest = file.path(dataFilepath, savFilename )
> download.file(dtaURL, destfile = dtaDest, method = "wget", mode = "wb")
--2018-05-29 15:56:59-- https://github.com/tidyverse/haven/blob/master/inst/examples/iris.dta?raw=true
Resolving github.com (github.com)... 192.30.255.113, 192.30.255.112
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/tidyverse/haven/raw/master/inst/examples/iris.dta [following]
--2018-05-29 15:56:59-- https://github.com/tidyverse/haven/raw/master/inst/examples/iris.dta
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/tidyverse/haven/master/inst/examples/iris.dta [following]
--2018-05-29 15:56:59-- https://raw.githubusercontent.com/tidyverse/haven/master/inst/examples/iris.dta
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.52.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.52.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8213 (8.0K) [application/octet-stream]
Saving to: '4gl6path/rand33m.dta'
0K ........ 100% 1.56M=0.005s
2018-05-29 15:56:59 (1.56 MB/s) - '4gl6path/rand33m.dta' saved [8213/8213]
> download.file(bdatURL, destfile = bdatDest, method = "wget", mode = "wb")
--2018-05-29 15:56:59-- https://github.com/tidyverse/haven/blob/master/inst/examples/iris.sas7bdat?raw=true
Resolving github.com (github.com)... 192.30.255.113, 192.30.255.112
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/tidyverse/haven/raw/master/inst/examples/iris.sas7bdat [following]
--2018-05-29 15:56:59-- https://github.com/tidyverse/haven/raw/master/inst/examples/iris.sas7bdat
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/tidyverse/haven/master/inst/examples/iris.sas7bdat [following]
--2018-05-29 15:56:59-- https://raw.githubusercontent.com/tidyverse/haven/master/inst/examples/iris.sas7bdat
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.52.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.52.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 131072 (128K) [application/octet-stream]
Saving to: '4gl6path/rand33m.bdata'
0K .......... .......... .......... .......... .......... 39% 4.05M 0s
50K .......... .......... .......... .......... .......... 78% 19.7M 0s
100K .......... .......... ........ 100% 19.3M=0.02s
2018-05-29 15:57:00 (7.83 MB/s) - '4gl6path/rand33m.bdata' saved [131072/131072]
> download.file(savURL, destfile = savDest, method = "wget", mode = "wb")
--2018-05-29 15:57:01-- https://github.com/tidyverse/haven/blob/master/inst/examples/iris.sav?raw=true
Resolving github.com (github.com)... 192.30.255.113, 192.30.255.112
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/tidyverse/haven/raw/master/inst/examples/iris.sav [following]
--2018-05-29 15:57:01-- https://github.com/tidyverse/haven/raw/master/inst/examples/iris.sav
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/tidyverse/haven/master/inst/examples/iris.sav [following]
--2018-05-29 15:57:01-- https://raw.githubusercontent.com/tidyverse/haven/master/inst/examples/iris.sav
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.52.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.52.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6690 (6.5K) [application/octet-stream]
Saving to: '4gl6path/rand33m.sav'
0K ...... 100% 3.09M=0.002s
2018-05-29 15:57:01 (3.09 MB/s) - '4gl6path/rand33m.sav' saved [6690/6690]
> # Stata
> read_dta(dtaDest)
Error in df_parse_dta_file(spec, encoding) :
Failed to parse <...>/äglæpath/randóóm.dta: Unable to open file.