R 无法从网上下载文件
R Cannot download a file from the web
我可以在浏览器中从这个网站下载一个文件
https://www.cmegroup.com/ftp/pub/settle/comex_future.csv
但是当我尝试以下操作时
url <- "https://www.cmegroup.com/ftp/pub/settle/comex_future.csv"
dest <- "C:\COMEXfut.csv"
download.file(url, dest)
我收到以下错误消息
Error in download.file(url, dest) :
cannot open URL 'https://www.cmegroup.com/ftp/pub/settle/comex_future.csv'
In addition: Warning message:
In download.file(url, dest) :
InternetOpenUrl failed: 'The operation timed out'
即使我选择:
options(timeout = max(600, getOption("timeout")))
知道为什么会这样吗?谢谢!
这里的问题是您下载的站点需要一些额外的 headers。提供它们的最简单方法是使用 httr
包
library(httr)
url <- "https://www.cmegroup.com/ftp/pub/settle/comex_future.csv"
UA <- paste('Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0)',
'Gecko/20100101 Firefox/98.0')
res <- GET(url, add_headers(`User-Agent` = UA, Connection = 'keep-alive'))
下载不到一秒。
如果你想保存文件你可以
writeBin(res$content, 'myfile.csv')
或者,如果您只是想直接将数据读入 R,甚至不保存它,您可以这样做:
content(res)
#> Rows: 527 Columns: 20
#> 0s-- Column specification ----------------------------------------------------------------
#> Delimiter: ","
#> chr (10): PRODUCT SYMBOL, CONTRACT MONTH, CONTRACT DAY, CONTRACT, PRODUCT DESCRIPTIO...
#> dbl (10): CONTRACT YEAR, OPEN, HIGH, LOW, LAST, SETTLE, EST. VOL, PRIOR SETTLE, PRIO...
#>
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 527 x 20
#> `PRODUCT SYMBOL` `CONTRACT MONTH` `CONTRACT YEAR` `CONTRACT DAY` CONTRACT
#> <chr> <chr> <dbl> <chr> <chr>
#> 1 0GC 07 2022 NA 0GCN22
#> 2 4GC 03 2022 NA 4GCH22
#> 3 4GC 05 2022 NA 4GCK22
#> 4 4GC 06 2022 NA 4GCM22
#> 5 4GC 08 2022 NA 4GCQ22
#> 6 4GC 10 2022 NA 4GCV22
#> 7 4GC 12 2022 NA 4GCZ22
#> 8 4GC 02 2023 NA 4GCG23
#> 9 4GC 04 2023 NA 4GCJ23
#> 10 4GC 06 2023 NA 4GCM23
#> # ... with 517 more rows, and 15 more variables: PRODUCT DESCRIPTION <chr>, OPEN <dbl>,
#> # HIGH <dbl>, HIGH AB INDICATOR <chr>, LOW <dbl>, LOW AB INDICATOR <chr>, LAST <dbl>,
#> # LAST AB INDICATOR <chr>, SETTLE <dbl>, PT CHG <chr>, EST. VOL <dbl>,
#> # PRIOR SETTLE <dbl>, PRIOR VOL <dbl>, PRIOR INT <dbl>, TRADEDATE <chr>
我可以在浏览器中从这个网站下载一个文件 https://www.cmegroup.com/ftp/pub/settle/comex_future.csv
但是当我尝试以下操作时
url <- "https://www.cmegroup.com/ftp/pub/settle/comex_future.csv"
dest <- "C:\COMEXfut.csv"
download.file(url, dest)
我收到以下错误消息
Error in download.file(url, dest) :
cannot open URL 'https://www.cmegroup.com/ftp/pub/settle/comex_future.csv'
In addition: Warning message:
In download.file(url, dest) :
InternetOpenUrl failed: 'The operation timed out'
即使我选择:
options(timeout = max(600, getOption("timeout")))
知道为什么会这样吗?谢谢!
这里的问题是您下载的站点需要一些额外的 headers。提供它们的最简单方法是使用 httr
包
library(httr)
url <- "https://www.cmegroup.com/ftp/pub/settle/comex_future.csv"
UA <- paste('Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0)',
'Gecko/20100101 Firefox/98.0')
res <- GET(url, add_headers(`User-Agent` = UA, Connection = 'keep-alive'))
下载不到一秒。
如果你想保存文件你可以
writeBin(res$content, 'myfile.csv')
或者,如果您只是想直接将数据读入 R,甚至不保存它,您可以这样做:
content(res)
#> Rows: 527 Columns: 20
#> 0s-- Column specification ----------------------------------------------------------------
#> Delimiter: ","
#> chr (10): PRODUCT SYMBOL, CONTRACT MONTH, CONTRACT DAY, CONTRACT, PRODUCT DESCRIPTIO...
#> dbl (10): CONTRACT YEAR, OPEN, HIGH, LOW, LAST, SETTLE, EST. VOL, PRIOR SETTLE, PRIO...
#>
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 527 x 20
#> `PRODUCT SYMBOL` `CONTRACT MONTH` `CONTRACT YEAR` `CONTRACT DAY` CONTRACT
#> <chr> <chr> <dbl> <chr> <chr>
#> 1 0GC 07 2022 NA 0GCN22
#> 2 4GC 03 2022 NA 4GCH22
#> 3 4GC 05 2022 NA 4GCK22
#> 4 4GC 06 2022 NA 4GCM22
#> 5 4GC 08 2022 NA 4GCQ22
#> 6 4GC 10 2022 NA 4GCV22
#> 7 4GC 12 2022 NA 4GCZ22
#> 8 4GC 02 2023 NA 4GCG23
#> 9 4GC 04 2023 NA 4GCJ23
#> 10 4GC 06 2023 NA 4GCM23
#> # ... with 517 more rows, and 15 more variables: PRODUCT DESCRIPTION <chr>, OPEN <dbl>,
#> # HIGH <dbl>, HIGH AB INDICATOR <chr>, LOW <dbl>, LOW AB INDICATOR <chr>, LAST <dbl>,
#> # LAST AB INDICATOR <chr>, SETTLE <dbl>, PT CHG <chr>, EST. VOL <dbl>,
#> # PRIOR SETTLE <dbl>, PRIOR VOL <dbl>, PRIOR INT <dbl>, TRADEDATE <chr>