使用 Jsoup 非递归地提取文本
extracting Text non recursively with Jsoup
这是我正在尝试的代码 运行 :
String html = "<a href=\"/name/zola-1\">ZOLA <span class=\"tiny\">(1)</span></a>";
Document doc = Jsoup.parse(html); //connect to the page
Element element = doc.getAllElements().first(); //recive the names elements
System.out.println(element.text()); //prints "ZOLA (1)"
System.out.println(element.ownText()); // prints nothing
我的目标是仅提取 "ZOLA",不提取子节点的文本,但 ownText
不打印任何内容...
我应该怎么做?
你可以使用这个:
String html = "<a href=\"/name/zola-1\">ZOLA <span class=\"tiny\">(1)</span></a>";
Document doc = Jsoup.parse(html);
Element elementA = doc.selectFirst("a");
System.out.println(elementA.ownText()); // ZOLA
问题是 doc.getAllElements().first()
returns
<html>
<head></head>
<body>
<a href="/name/zola-1">ZOLA <span class="tiny">(1)</span></a>
</body>
</html>
如你所愿
<a href="/name/zola-1">ZOLA <span class="tiny">(1)</span></a>
以下应该适合您:
String html = "<a href=\"/name/zola-1\">ZOLA <span class=\"tiny\">(1)</span></a>";
Document doc = Jsoup.parse(html);
Elements links = doc.getElementsByTag("a");
System.out.println(links.get(0));
System.out.println(links.get(0).ownText());
输出:
<a href="/name/zola-1">ZOLA <span class="tiny">(1)</span></a>
ZOLA
这是我正在尝试的代码 运行 :
String html = "<a href=\"/name/zola-1\">ZOLA <span class=\"tiny\">(1)</span></a>";
Document doc = Jsoup.parse(html); //connect to the page
Element element = doc.getAllElements().first(); //recive the names elements
System.out.println(element.text()); //prints "ZOLA (1)"
System.out.println(element.ownText()); // prints nothing
我的目标是仅提取 "ZOLA",不提取子节点的文本,但 ownText
不打印任何内容...
我应该怎么做?
你可以使用这个:
String html = "<a href=\"/name/zola-1\">ZOLA <span class=\"tiny\">(1)</span></a>";
Document doc = Jsoup.parse(html);
Element elementA = doc.selectFirst("a");
System.out.println(elementA.ownText()); // ZOLA
问题是 doc.getAllElements().first()
returns
<html>
<head></head>
<body>
<a href="/name/zola-1">ZOLA <span class="tiny">(1)</span></a>
</body>
</html>
如你所愿
<a href="/name/zola-1">ZOLA <span class="tiny">(1)</span></a>
以下应该适合您:
String html = "<a href=\"/name/zola-1\">ZOLA <span class=\"tiny\">(1)</span></a>";
Document doc = Jsoup.parse(html);
Elements links = doc.getElementsByTag("a");
System.out.println(links.get(0));
System.out.println(links.get(0).ownText());
输出:
<a href="/name/zola-1">ZOLA <span class="tiny">(1)</span></a>
ZOLA