RSelenium:在导航到直接 pdf 下载时挂起

RSelenium: hangs in navigate to direct pdf download

通过 Docker 工具箱为 Windows 和 selenium/standalone-firefox-debug 容器使用 RSelenium - 一切正常: docker run -d -v //c/test/://home/seluser/Downloads -p 4445:4444 -p 5901:5900 selenium/standalone-firefox-debug

已设置 firefox 配置文件以直接下载 pdf:

fprof <- makeFirefoxProfile(list(browser.startup.homepage = "about:blank"
                                 , startup.homepage_override_url = "about:blank"
                                 , startup.homepage_welcome_url = "about:blank"
                                 , startup.homepage_welcome_url.additional = "about:blank"
                                 , browser.download.dir = "/home/seluser/Downloads"
                                 , browser.download.folderList = 2L
                                 , browser.download.manager.showWhenStarting = FALSE
                                 , browser.download.manager.focusWhenStarting = FALSE
                                 , browser.download.manager.closeWhenDone = TRUE
                                 , browser.helperApps.neverAsk.saveToDisk = "application/pdf, application/octet-stream"
                                 , pdfjs.disabled = TRUE
                                 , plugin.scan.plid.all = FALSE
                                 , plugin.scan.Acrobat = 99L))

使用以下代码,当我直接导航到 pdf 时,它可以很好地下载到指定目录,但随后它挂起,不允许执行任何后续代码。

library(RSelenium)

remDr <- remoteDriver(remoteServerAddr = "*docker-ip*", port = 4445L, extraCapabilities = fprof)
remDr$open()
remDr$navigate("http://www.equibase.com/premium/eqbPDFChartPlus.cfm?RACE=A&BorP=P&TID=BEL&CTRY=USA&DT=09/12/2015&DAY=D&STYLE=EQB")

我必须手动停止 R 代码,显示的错误是:

Error in checkError(res) : 
Undefined error in httr call. httr output: Operation was aborted by an application callback

如果我VNC进入容器,看看浏览器显示的内容,文件已经下载,但地址栏中没有任何内容。

screenshot 有任何想法吗?我假设这与 httr/rselenium 包没有从浏览器接收到某种 'loaded' 信号有关,但这超出了我的故障排除能力。此方法以前使用 .jar 文件 selenium-standalone-server 和 RSelenium 时有效。

sessionInfo() & remDr$open() 输出如下:

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RSelenium_1.7.1

loaded via a namespace (and not attached):
 [1] httr_1.2.1     R6_2.2.0       assertthat_0.1 tools_3.3.2    wdman_0.2.2    binman_0.1.0  
 [7] curl_2.3       Rcpp_0.12.9    jsonlite_1.2   caTools_1.17.1 openssl_0.9.6  bitops_1.0-6  
[13] semver_0.2.0   XML_3.98-1.5  



> remDr$open()
[1] "Connecting to remote server"
$rotatable
[1] FALSE

$raisesAccessibilityExceptions
[1] FALSE

$firefoxOptions
$firefoxOptions$args
list()

$firefoxOptions$profile
[1] "UEsDBBQACAgIAEwPW0oAAAAAAAAAAAAAAAAIAAAAcHJlZnMuanOlkU9LAzEQxe+C36HsSaFmwVv1tNBjbwoey2wy242dZsJM0v36JqJYimWV3vLn/R4z72VF2UbB4a7phadyM5pAUo5m5ANG2GGzXDTQc05PPUHYN/fPtzf5BzuXb/mIIt7hNgv9l52QbDlfiRpwzifPAeZcvnd2PAVicMZ5qUhbbVtFqtp2/fWrs/jA5FA2XlNxeZxTHyCUyUviI09vI4aXupMPu8IOQIp/5Qe2Wa8xsMSK1WDNofadJF9iR6SI0sWoJmBputO9UTjiK6+97j/jjpG8hZp/G92wXJw+sE2YHjQJwuE8zSJ+19KAQk/ofh8jUt75YNRCMMXVGSC6sO2ptLPCPdRSVqsq+wBQSwcII+hBcQsBAAD3AgAAUEsDBBQACAgIAEwPW0oAAAAAAAAAAAAAAAAHAAAAdXNlci5qc51WTW/bMAy971cMOW3AKqTretlOXdcBA4Z1aFDsKMgSbauRJU0fcfPvR/mjSRNHbndKbJMS+fj4yOjBUeugfLconGnxiXhWQvdf6oo0TLXMAQHNCgVi8eFtyZSH91/exJ2nYAFtrHEhudTAVKj7Z4JGG8ln/DWE1rg1qUOwxNbS19uz9Nky788U6CrU6Pjx8vK52xiwAybwR0AAHkB8l86HK4yFK0C34OJhuKbBvB4pr51pgHrupA3URU2DbJLLxXL6osAKTxAOfauvlfEwnc1oLUyrlWEC79KsSsDWpv1Tg14hWgmpaXeLQdng02W0MYKpGexhE4xRnoBzxnGjvVH7cB+n72WljUbUGmgKcKvu0edz8eC9RKtgkAsOfETcSgyUcsd8nfdVUq+JsaApPAZwmqlUzFczqExlvYt6+rIWCuHkBp8Z54DljBoz90gHysEFP4nEU6Wkt4ptQdycL1e/DDInlfbTtDG+Erf6j9RYX3++JBIvMvd3P9FjwQoTw+dCMb1... <truncated>


$appBuildId
[1] "20170125094131"

$version
[1] ""

$platform
[1] "LINUX"

$proxy
named list()

$command_id
[1] 1

$nativeEvents
[1] TRUE

$specificationLevel
[1] 0

$acceptSslCerts
[1] FALSE

$processId
[1] 3012

$webdriver.remote.sessionid
[1] "6263b5ab-9375-425e-aa00-8fc632dc492e"

$browserVersion
[1] "51.0.1"

$platformVersion
[1] "4.4.47-boot2docker"

$XULappId
[1] "{ec8030f7-c20a-464f-9b0e-13a3a9e97384}"

$browserName
[1] "firefox"

$takesScreenshot
[1] TRUE

$javascriptEnabled
[1] TRUE

$takesElementScreenshot
[1] TRUE

$platformName
[1] "linux"

$cssSelectorsEnabled
[1] TRUE

$firefox_profile
[1] "UEsDBBQAAgAIAJRZW0oj6EFxCwEAAPcCAAAIAAAAcHJlZnMuanOlkU9LAzEQxe+C36HsSaFmwVv1tNBjbwoey2wy242dZsJM0v36JqJYimWV3vLn/R4z72VF2UbB4a7phadyM5pAUo5m5ANG2GGzXDTQc05PPUHYN/fPtzf5BzuXb/mIIt7hNgv9l52QbDlfiRpwzifPAeZcvnd2PAVicMZ5qUhbbVtFqtp2/fWrs/jA5FA2XlNxeZxTHyCUyUviI09vI4aXupMPu8IOQIp/5Qe2Wa8xsMSK1WDNofadJF9iR6SI0sWoJmBputO9UTjiK6+97j/jjpG8hZp/G92wXJw+sE2YHjQJwuE8zSJ+19KAQk/ofh8jUt75YNRCMMXVGSC6sO2ptLPCPdRSVqsq+wBQSwECHgAUAAIACACUWVtKI+hBcQsBAAD3AgAACAAAAAAAAAABACAAAAAAAAAAcHJlZnMuanNQSwUGAAAAAAEAAQA2AAAAMQEAAAAA"

$id
[1] "6263b5ab-9375-425e-aa00-8fc632dc492e"

我在使用最新版本的 firefox (51.0.1) 时遇到了同样的问题。 这是在 windows 机器上,问题似乎是 pdfjs.disabled 标志。旧版本的 Firefox 中不存在此问题。例如,Docker 图像标记为 2.53.1 运行s firefox 47。如果可能 运行 使用旧版本(在 linux 框上):

docker run -d -p 4445:4444 -p 5901:5900 -v /home/john/test:/home/seluser/Downloads selenium/standalone-firefox-debug:2.53.1

现在 运行我们看到您的代码:

fprof <- makeFirefoxProfile(list(browser.startup.homepage = "about:blank"
                                 , startup.homepage_override_url = "about:blank"
                                 , startup.homepage_welcome_url = "about:blank"
                                 , startup.homepage_welcome_url.additional = "about:blank"
                                 , browser.download.dir = "/home/seluser/Downloads"
                                 , browser.download.folderList = 2L
                                 , browser.download.manager.showWhenStarting = FALSE
                                 , browser.download.manager.focusWhenStarting = FALSE
                                 , browser.download.manager.closeWhenDone = TRUE
                                 , browser.helperApps.neverAsk.saveToDisk = "application/pdf, application/octet-stream"
                                 , pdfjs.disabled = TRUE
                                 , plugin.scan.plid.all = FALSE
                                 , plugin.scan.Acrobat = 99L))
library(RSelenium)

remDr <- remoteDriver(port = 4445L, extraCapabilities = fprof)
remDr$open()
remDr$navigate("http://www.equibase.com/premium/eqbPDFChartPlus.cfm?RACE=A&BorP=P&TID=BEL&CTRY=USA&DT=09/12/2015&DAY=D&STYLE=EQB")

> list.files("/home/john/test/")
[1] "eqbPDFChartPlus.cfm"

pdf 需要重命名(它被命名为 colfusion .cfm 文件)

至于最新版本的 firefox 发生了什么,您可能需要参考 geckodriver 项目。使用 RSelenium 以外客户端的用户最近也遇到了问题 Can't download PDF with selenium webdriver + firefox