使用 jSoup 从所有标题标签中提取链接

Question

我正在尝试从网页中存在的所有标题标签 <h3> 中提取链接（标题及其地址）。

我试过的代码是：

String u="http://www.thehindu.com/business/";
Document docu = (Document) Jsoup.connect(u).get();

Elements lnk = docu.select("h3");
  for (Element an : lnk) {
      String s= an.attr("abs:href");

        String name = an.text();
        System.out.println( s);

 }

我没有得到任何输出。有什么问题？

Answer 1

您 select 编辑了 h3 并且您正在尝试读取它的 href 属性，但是 h3 没有（没有 <h3 href="foobar">).你想要 select 的是 a ，它被放置在 h3 中并从中读取 href 值。

所以你的代码应该更像

String u = "http://www.thehindu.com/business/";
Document docu = (Document) Jsoup.connect(u).get();

Elements lnk = docu.select("h3 a[href]");
for (Element an : lnk) {
    String s = an.attr("abs:href");
    String name = an.text();

    System.out.println(name);
    System.out.println(s);
    System.out.println("--------");

}

使用 jSoup 从所有标题标签中提取链接

Extracting links from all heading tags using jSoup

java

html-parsing

jsoup