htmlUnit - 如何获取非元素内容
htmlUnit - How to get non-element content
我是htmlUnit新手,如题,碰巧遇到一些内容不在element中。例如,
<div class="slide-title">
<h2> Lady at her dressing table in a garden</h2>
<p>
Chinese
<br>Southern Song dynasty
<br>mid-12th century
<br>
<a href="/collections/search?f[0]=field_artists%253Afield_artist%3A1411">Su Hanchen</a> (Chinese, active 1120s–1160s)
</p>
</div>
有"Chinese"、"Southern Song Dynasty"、"mid-12th century"三个信息,都在标签p中,用标签br隔开。如何定位这三个内容并获取文本内容?
谢谢。
使用XPath,即domNode.getFirstByXPath(path)
//div[@class='slide-title']/p/text()[1] = "Chinese"
//div[@class='slide-title']/p/text()[2] = "Southern Song Dynasty"
...
PS 使用 Chrome 开发人员工具可以轻松玩转 XPath。在控制台中使用 $x("//some-path")
。
我是htmlUnit新手,如题,碰巧遇到一些内容不在element中。例如,
<div class="slide-title">
<h2> Lady at her dressing table in a garden</h2>
<p>
Chinese
<br>Southern Song dynasty
<br>mid-12th century
<br>
<a href="/collections/search?f[0]=field_artists%253Afield_artist%3A1411">Su Hanchen</a> (Chinese, active 1120s–1160s)
</p>
</div>
有"Chinese"、"Southern Song Dynasty"、"mid-12th century"三个信息,都在标签p中,用标签br隔开。如何定位这三个内容并获取文本内容?
谢谢。
使用XPath,即domNode.getFirstByXPath(path)
//div[@class='slide-title']/p/text()[1] = "Chinese"
//div[@class='slide-title']/p/text()[2] = "Southern Song Dynasty"
...
PS 使用 Chrome 开发人员工具可以轻松玩转 XPath。在控制台中使用 $x("//some-path")
。