如何使用 HtmlUnit 处理太多重定向
How to handle Too much redirect with HtmlUnit
我正在尝试解析网站,但遇到了 Too much redirect
异常。
这是我的代码:
WebClient client = new WebClient(BrowserVersion.FIREFOX_24);
HtmlPage homePage = null;
String url = "http://www.freelake.org/pages/Freetown-Lakeville_RSD/Departments/Director_of_Financial_Operatio";
try {
client.getOptions().setUseInsecureSSL(true);
client.setAjaxController(new NicelyResynchronizingAjaxController());
client.getOptions().setThrowExceptionOnFailingStatusCode(false);
client.getOptions().setThrowExceptionOnScriptError(false);
client.waitForBackgroundJavaScript(30000);
client.waitForBackgroundJavaScriptStartingBefore(30000);
client.getOptions().setCssEnabled(false);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setRedirectEnabled(true);
homePage = client.getPage(url);
synchronized (homePage) {
homePage.wait(25000);
}
System.out.println(homePage.asXml());
} catch (Exception e) {
e.printStackTrace();
}
例外情况如下
com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException: Too much redirect for http://www.freelake.org/resolver/2345183424.20480.0000/route.00/pages/Freetown-Lakeville_RSD/Departments/Director_of_Financial_Operatio
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1353)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1371)
有什么办法可以解决这个问题吗?
页面 http://www.freelake.org/pages/Freetown-Lakeville_RSD/Departments/Director_of_Financial_Operatio 发送了 2 次重定向:
- http://www.freelake.org/GroupHome.page,然后到
- http://www.freelake.org/pages/Freetown-Lakeville_RSD/Departments/Director_of_Financial_Operatio
使用第二个 url 应该可以。或者想办法告诉库允许一定数量的重定向;在这种情况下为 2。
编辑:这可能有帮助。我自己不要使用这个库:
client.getOptions().setRedirectEnabled(true);
这是因为 HtmlUnit 缓存了响应,并且重定向到另一个页面然后返回。
我用下面的方法测试过,它有效:
client.getCache().setMaxSize(0);
我遇到了同样的问题,但我是通过 Selenium 来解决这个问题的。在 Selenium 中,您无法直接访问 WebClient,因为它是 protected
.
我是这样解决的:
WebDriver driver = new HtmlUnitDriver(true) {
{
this.getWebClient().getCache().setMaxSize(0);
}
};
我正在尝试解析网站,但遇到了 Too much redirect
异常。
这是我的代码:
WebClient client = new WebClient(BrowserVersion.FIREFOX_24);
HtmlPage homePage = null;
String url = "http://www.freelake.org/pages/Freetown-Lakeville_RSD/Departments/Director_of_Financial_Operatio";
try {
client.getOptions().setUseInsecureSSL(true);
client.setAjaxController(new NicelyResynchronizingAjaxController());
client.getOptions().setThrowExceptionOnFailingStatusCode(false);
client.getOptions().setThrowExceptionOnScriptError(false);
client.waitForBackgroundJavaScript(30000);
client.waitForBackgroundJavaScriptStartingBefore(30000);
client.getOptions().setCssEnabled(false);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setRedirectEnabled(true);
homePage = client.getPage(url);
synchronized (homePage) {
homePage.wait(25000);
}
System.out.println(homePage.asXml());
} catch (Exception e) {
e.printStackTrace();
}
例外情况如下
com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException: Too much redirect for http://www.freelake.org/resolver/2345183424.20480.0000/route.00/pages/Freetown-Lakeville_RSD/Departments/Director_of_Financial_Operatio
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1353)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1371)
有什么办法可以解决这个问题吗?
页面 http://www.freelake.org/pages/Freetown-Lakeville_RSD/Departments/Director_of_Financial_Operatio 发送了 2 次重定向:
- http://www.freelake.org/GroupHome.page,然后到
- http://www.freelake.org/pages/Freetown-Lakeville_RSD/Departments/Director_of_Financial_Operatio
使用第二个 url 应该可以。或者想办法告诉库允许一定数量的重定向;在这种情况下为 2。
编辑:这可能有帮助。我自己不要使用这个库:
client.getOptions().setRedirectEnabled(true);
这是因为 HtmlUnit 缓存了响应,并且重定向到另一个页面然后返回。
我用下面的方法测试过,它有效:
client.getCache().setMaxSize(0);
我遇到了同样的问题,但我是通过 Selenium 来解决这个问题的。在 Selenium 中,您无法直接访问 WebClient,因为它是 protected
.
我是这样解决的:
WebDriver driver = new HtmlUnitDriver(true) {
{
this.getWebClient().getCache().setMaxSize(0);
}
};