如何使用 HTMLUnit 单击锚点下载 ZIP 文件
How to download a ZIP file with HTMLUnit clicking on an anchor
我正在尝试使用以下代码下载带有 HTMLUnit 2.32 的 ZIP 文件。
我获得了一个 "myfile.zip" 比通过普通浏览器下载的更大的文件(179kb 对 79kb)并且它已损坏。
如何使用 HTMLUnit 单击锚点并下载文件?
WebClient wc = new WebClient(BrowserVersion.CHROME);
final String HREF_SCARICA_CONSOLIDATI = "/web/area-pubblica/quotate?viewId=export_quotate";
final String CONSOBBase = "http://www.consob.it";
HtmlPage page = wc.getPage(CONSOBBase + HREF_SCARICA_CONSOLIDATI);
final String downloadButtonXpath = "//a[contains(@href, 'javascript:downloadAzionariato()')]";
List<HtmlAnchor> downloadAnchors = page.getByXPath(downloadButtonXpath);
HtmlAnchor downloadAnchor = downloadAnchors.get(0);
UnexpectedPage downloadedFile = downloadAnchor.click();
InputStream contentAsStream = downloadedFile.getWebResponse().getContentAsStream();
File destFile = new File("/tmp", "myfile.zip");
Writer out = new OutputStreamWriter(new FileOutputStream(destFile));
IOUtils.copy(contentAsStream, out);
out.close();
已稍微更新您的代码片段以使其正常工作。希望内联评论有助于理解发生了什么(使用 HtmlUnit 的最新 SNAPSHOT 代码 (2.34-SNAPSHOT 2018/11/03)
final String HREF_SCARICA_CONSOLIDATI = "http://www.consob.it/web/area-pubblica/quotate?viewId=export_quotate";
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {
HtmlPage page = webClient.getPage(HREF_SCARICA_CONSOLIDATI);
final String downloadButtonXpath = "//a[contains(@href, 'javascript:downloadAzionariato()')]";
List<HtmlAnchor> downloadAnchors = page.getByXPath(downloadButtonXpath);
HtmlAnchor downloadAnchor = downloadAnchors.get(0);
// click does some javascript magic - have a look at your browser
// seems like this opens a new window with the content as response
// because of this we can ignore the page returned from click
downloadAnchor.click();
// instead of we are waiting a bit until the javascript is done
webClient.waitForBackgroundJavaScript(1000);
// now we have to pick up the window/page that was opened as result of the download
Page downloadPage = webClient.getCurrentWindow().getEnclosedPage();
// and finally we can save to content
File destFile = new File("/tmp", "myfile.zip");
try (InputStream contentAsStream = downloadPage.getWebResponse().getContentAsStream()) {
try (OutputStream out = new FileOutputStream(destFile)) {
IOUtils.copy(contentAsStream, out);
}
}
System.out.println("Output written to " + destFile.getAbsolutePath());
}
虽然 RBRi 的注意事项很有趣,但我发现我的代码无需修改即可与 HTMLUnit 2.32 一起使用,但我以错误的方式编写了文件!
我用了
Writer out = new OutputStreamWriter(new FileOutputStream(destFile));
IOUtils.copy(contentAsStream, out);
虽然它必须是(没有 OutputStreamWriter)
FileOutputStream out = new FileOutputStream(destFile);
IOUtils.copy(contentAsStream, out);
我正在尝试使用以下代码下载带有 HTMLUnit 2.32 的 ZIP 文件。
我获得了一个 "myfile.zip" 比通过普通浏览器下载的更大的文件(179kb 对 79kb)并且它已损坏。
如何使用 HTMLUnit 单击锚点并下载文件?
WebClient wc = new WebClient(BrowserVersion.CHROME);
final String HREF_SCARICA_CONSOLIDATI = "/web/area-pubblica/quotate?viewId=export_quotate";
final String CONSOBBase = "http://www.consob.it";
HtmlPage page = wc.getPage(CONSOBBase + HREF_SCARICA_CONSOLIDATI);
final String downloadButtonXpath = "//a[contains(@href, 'javascript:downloadAzionariato()')]";
List<HtmlAnchor> downloadAnchors = page.getByXPath(downloadButtonXpath);
HtmlAnchor downloadAnchor = downloadAnchors.get(0);
UnexpectedPage downloadedFile = downloadAnchor.click();
InputStream contentAsStream = downloadedFile.getWebResponse().getContentAsStream();
File destFile = new File("/tmp", "myfile.zip");
Writer out = new OutputStreamWriter(new FileOutputStream(destFile));
IOUtils.copy(contentAsStream, out);
out.close();
已稍微更新您的代码片段以使其正常工作。希望内联评论有助于理解发生了什么(使用 HtmlUnit 的最新 SNAPSHOT 代码 (2.34-SNAPSHOT 2018/11/03)
final String HREF_SCARICA_CONSOLIDATI = "http://www.consob.it/web/area-pubblica/quotate?viewId=export_quotate";
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {
HtmlPage page = webClient.getPage(HREF_SCARICA_CONSOLIDATI);
final String downloadButtonXpath = "//a[contains(@href, 'javascript:downloadAzionariato()')]";
List<HtmlAnchor> downloadAnchors = page.getByXPath(downloadButtonXpath);
HtmlAnchor downloadAnchor = downloadAnchors.get(0);
// click does some javascript magic - have a look at your browser
// seems like this opens a new window with the content as response
// because of this we can ignore the page returned from click
downloadAnchor.click();
// instead of we are waiting a bit until the javascript is done
webClient.waitForBackgroundJavaScript(1000);
// now we have to pick up the window/page that was opened as result of the download
Page downloadPage = webClient.getCurrentWindow().getEnclosedPage();
// and finally we can save to content
File destFile = new File("/tmp", "myfile.zip");
try (InputStream contentAsStream = downloadPage.getWebResponse().getContentAsStream()) {
try (OutputStream out = new FileOutputStream(destFile)) {
IOUtils.copy(contentAsStream, out);
}
}
System.out.println("Output written to " + destFile.getAbsolutePath());
}
虽然 RBRi 的注意事项很有趣,但我发现我的代码无需修改即可与 HTMLUnit 2.32 一起使用,但我以错误的方式编写了文件!
我用了
Writer out = new OutputStreamWriter(new FileOutputStream(destFile));
IOUtils.copy(contentAsStream, out);
虽然它必须是(没有 OutputStreamWriter)
FileOutputStream out = new FileOutputStream(destFile);
IOUtils.copy(contentAsStream, out);