rvest, html_nodes() error: cannot coerce type 'environment' to vector of type 'list'. Fails RScript, works in Session

rvest, html_nodes() error: cannot coerce type 'environment' to vector of type 'list'. Fails RScript, works in Session

html_nodes() 函数在 运行 作为可执行 RScript 时失败如下,但在 运行 交互时成功。有人知道 运行 有什么不同吗?

交互式 运行 是 运行 新会话,源语句是第一个 运行。

$ ./test-pdp.R
>
> ################################################################################
> # Setup
> ################################################################################
> suppressPackageStartupMessages(library(plyr))
> suppressPackageStartupMessages(library(dplyr))
> suppressPackageStartupMessages(library(stringr))
> suppressPackageStartupMessages(library(rvest))
> suppressPackageStartupMessages(library(httr))
>
>
> read_html("http://google.com") %>%
+     html_nodes("div") %>%
+     length()
Error in as.vector(x, "list") :
  cannot coerce type 'environment' to vector of type 'list'
Calls: %>% ... <Anonymous> -> lapply -> as.list -> as.list.default
Execution halted

然而当 运行 作为 source() 交互时它成功了:

> source("/Users/a6001389/Documents/projects/hottest-deals-page-scrape/src/test-pdp.R", echo=TRUE)
> #!/usr/bin/RScript
> options(echo=TRUE)
> ################################################################################
> # Setup
> ####################################################### .... [TRUNCATED] 
> suppressPackageStartupMessages(library(dplyr))
> suppressPackageStartupMessages(library(stringr))
> suppressPackageStartupMessages(library(rvest))
> suppressPackageStartupMessages(library(httr))
> read_html("http://google.com") %>%
+     html_nodes("div") %>%
+     length()
[1] 17

谢谢, 马特

这可能是 magrittr::%>% 运算符工作方式的副作用。来自 Magrittr Documentation - Page 8: %>% Pipe:

The magrittr pipe operators use non-standard evaluation. They capture their inputs and examines them to figure out how to proceed. First a function is produced from all of the individual right-hand side expressions, and then the result is obtained by applying this function to the left-hand side. For most purposes, one can disregard the subtle aspects of magrittr's evaluation, but some functions may capture their calling environment, and thus using the operators will not be exactly equivalent to the "standard call" without pipe-operators (Emphasis mine).

因此,在没有 %>% 的情况下尝试看看是否是因为 html_nodes 错误地从命令行捕获环境(如您的错误消息所示),而在交互式会话中,它可以获取会话的环境变量:

google_node <- read_html("http://google.com");
div_nodes   <- html_nodes(google_node, "div");
length(div_nodes);

当作为可执行 RScript 调用时是否有效?

添加行:

library(methods)

根据 Hadley Wickham 对原始问题的评论确实解决了这个错误。为什么它解决了错误,我不知道。但我发布了一个答案,所以这里有一个易于参考的解决方案。如果发布为什么这解决了问题,我会接受那个答案。

将@mekki-macaulay 下面的评论添加到此处的文本中,因为它确实增加了一些清晰度:

This thread might shed some light on it. It seems that in some contexts RSCRIPT doesn't load package::methods by default, whereas interactive sessions do load it by default. It seems that the "when" is not clear, but explicitly calling library(methods) for all RSCRIPT executions seems to be the safe bet: can use package interactively, but Rscript gives errors