解析单页 Web 应用程序
Parse a singlepage web application
java 是否有任何库来解析单页网站,例如使用 AngularJs 创建的网站?
从 jsoup 的官方文档看来,它不适用于 js。
该解决方案不应使用任何已安装的浏览器。
看看下面的link,它可能会解决你的问题。
try jsoup + manual parsing
如@JonasCz 所述,尝试使用 HtmlUnit
代码可能如下所示:
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class Test {
public static void main(String[] args) {
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24);
HtmlPage page = null;
try {
page = webClient.getPage("https://docs.angularjs.org/api/ng/service/$http");
} catch (Exception e) {}
System.out.println(page.asXml());
webClient.closeAllWindows();
}
}
这是使用 AngularJS 和 HtmlUnit
下载页面的正确代码
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.setCssErrorHandler(new SilentCssErrorHandler());
webClient.getOptions().setCssEnabled(true);
webClient.getOptions().setRedirectEnabled(true);
webClient.getOptions().setAppletEnabled(false);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setPopupBlockerEnabled(true);
webClient.getOptions().setTimeout(10000);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(true);
webClient.getOptions().setThrowExceptionOnScriptError(true);
webClient.getOptions().setPrintContentOnFailingStatusCode(true);
webClient.waitForBackgroundJavaScript(5000);
try {
HtmlPage page = webClient.getPage(URL);
System.out.println(page.asText());
} catch (Exception e) {
e.printStackTrace();
}
webClient.closeAllWindows();
java 是否有任何库来解析单页网站,例如使用 AngularJs 创建的网站?
从 jsoup 的官方文档看来,它不适用于 js。
该解决方案不应使用任何已安装的浏览器。
看看下面的link,它可能会解决你的问题。
try jsoup + manual parsing
如@JonasCz 所述,尝试使用 HtmlUnit
代码可能如下所示:
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class Test {
public static void main(String[] args) {
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24);
HtmlPage page = null;
try {
page = webClient.getPage("https://docs.angularjs.org/api/ng/service/$http");
} catch (Exception e) {}
System.out.println(page.asXml());
webClient.closeAllWindows();
}
}
这是使用 AngularJS 和 HtmlUnit
下载页面的正确代码final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.setCssErrorHandler(new SilentCssErrorHandler());
webClient.getOptions().setCssEnabled(true);
webClient.getOptions().setRedirectEnabled(true);
webClient.getOptions().setAppletEnabled(false);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setPopupBlockerEnabled(true);
webClient.getOptions().setTimeout(10000);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(true);
webClient.getOptions().setThrowExceptionOnScriptError(true);
webClient.getOptions().setPrintContentOnFailingStatusCode(true);
webClient.waitForBackgroundJavaScript(5000);
try {
HtmlPage page = webClient.getPage(URL);
System.out.println(page.asText());
} catch (Exception e) {
e.printStackTrace();
}
webClient.closeAllWindows();