RCurl 中的 getURLContent 仍然不允许我从网络中分隔数据
getURLContent in RCurl still doesn't allow me to delimit data from the web
我正在为使用在线可用数据集的项目编写 R 脚本。最初我尝试了以下方法:
url <- "http://cssb2.biology.gatech.edu/pocketome/pdb_120518_all_het.lpc.sel.c0.9.len10.lst2"
pp <- read.delim(url)
返回错误:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : cannot open: HTTP status was '403 Forbidden'
在网上查找答案后,我遇到了使用 RCurl 的选项:
pp <- getURLContent(url, verbose = TRUE, useragent = getOption("HTTPUserAgent"))
但是,当我尝试对 pp
进行定界或扫描或任何类型的操作时,出现以下错误:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file
附加信息:
> class(pp)
[1] "character"
> attr(pp, "Content-Type")
charset
"text/plain" "UTF-8"
老实说,我以前从未使用过 RCurl,我现在只是想弄清楚它是什么/能做什么/能做什么。我从以下位置找到了使用它的建议:
http://r.789695.n4.nabble.com/File-Downloading-Problem-td3022137.html
尝试更现代的 httr
:
library(httr)
resp <- GET("http://cssb2.biology.gatech.edu/pocketome/pdb_120518_all_het.lpc.sel.c0.9.len10.lst2",
user_agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.13+ (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2"))
read.table(text=content(resp, as="text"), sep=",", header=TRUE, skip=2)
我正在为使用在线可用数据集的项目编写 R 脚本。最初我尝试了以下方法:
url <- "http://cssb2.biology.gatech.edu/pocketome/pdb_120518_all_het.lpc.sel.c0.9.len10.lst2"
pp <- read.delim(url)
返回错误:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : cannot open: HTTP status was '403 Forbidden'
在网上查找答案后,我遇到了使用 RCurl 的选项:
pp <- getURLContent(url, verbose = TRUE, useragent = getOption("HTTPUserAgent"))
但是,当我尝试对 pp
进行定界或扫描或任何类型的操作时,出现以下错误:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file
附加信息:
> class(pp)
[1] "character"
> attr(pp, "Content-Type")
charset
"text/plain" "UTF-8"
老实说,我以前从未使用过 RCurl,我现在只是想弄清楚它是什么/能做什么/能做什么。我从以下位置找到了使用它的建议:
http://r.789695.n4.nabble.com/File-Downloading-Problem-td3022137.html
尝试更现代的 httr
:
library(httr)
resp <- GET("http://cssb2.biology.gatech.edu/pocketome/pdb_120518_all_het.lpc.sel.c0.9.len10.lst2",
user_agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.13+ (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2"))
read.table(text=content(resp, as="text"), sep=",", header=TRUE, skip=2)