RCurl 中的 getURLContent 仍然不允许我从网络中分隔数据

Question

我正在为使用在线可用数据集的项目编写 R 脚本。最初我尝试了以下方法：

url <- "http://cssb2.biology.gatech.edu/pocketome/pdb_120518_all_het.lpc.sel.c0.9.len10.lst2"
pp <- read.delim(url)

返回错误：

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : cannot open: HTTP status was '403 Forbidden'

在网上查找答案后，我遇到了使用 RCurl 的选项：

pp <- getURLContent(url, verbose = TRUE, useragent = getOption("HTTPUserAgent"))

但是，当我尝试对 pp 进行定界或扫描或任何类型的操作时，出现以下错误：

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file

附加信息：

> class(pp)
[1] "character"
> attr(pp, "Content-Type")
                  charset 
"text/plain"      "UTF-8"

老实说，我以前从未使用过 RCurl，我现在只是想弄清楚它是什么/能做什么/能做什么。我从以下位置找到了使用它的建议： http://r.789695.n4.nabble.com/File-Downloading-Problem-td3022137.html

Answer 1

尝试更现代的 httr:

library(httr)

resp <- GET("http://cssb2.biology.gatech.edu/pocketome/pdb_120518_all_het.lpc.sel.c0.9.len10.lst2",
            user_agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.13+ (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2"))

read.table(text=content(resp, as="text"), sep=",", header=TRUE, skip=2)

RCurl 中的 getURLContent 仍然不允许我从网络中分隔数据

getURLContent in RCurl still doesn't allow me to delimit data from the web

r

download

rcurl