文本节点 xpath 的 Jsoup CSS 选择器
Jsoup CSS selector for a text node xpath
HTML代码贴在最后,我要select"OF"元素。
这是 CSS select 或
Elements position = doc.select("#content > table:nth-child(4) > tbody > tr > td:nth-child(1) > table > tbody > tr:nth-child(1) > td > div:nth-child(5) > strong:nth-child(4)");
for (Element p : position) {
System.out.println(p);
}
这是输出
p returns "<strong>Position:</strong>"
p.text() returns "Position:"
来自 chrome 的 xpath 当我突出显示 "OF" 时:
//*@id="content"]/table[1]/tbody/tr/td[1]/table/tbody/tr[1]/td/div[3]/text()[4]
HTML代码
<div style="font-size: 10pt; padding-left:5px;">
<strong>Birthdate:</strong> 8/7/1991 (23 y, 6 m, 10 d)
<strong>Bats/Throws:</strong> R/R
<strong>Height/Weight:</strong> 6-2/230
<strong>Position:</strong> OF<br /><b>Drafted:</b> <a href="statss.aspx?playerid=10155&position=OF#draft" style="text-decoration:none;">2009 June Amateur Draft - Round: 1, Pick: 25, Overall: 25, Team: Los Angeles Angels</a><br />
<strong>Contract:</strong> <a href="statss.aspx?playerid=10155&position=OF#contract" style="text-decoration:none;">4.5M / 6 Years (2015 - 2020)</a>
</div>
<div style="font-size: 10pt; padding-left:5px;">
<strong>Birthdate:</strong> 8/7/1991 (23 y, 6 m, 10 d)
<strong>Bats/Throws:</strong> R/R
<strong>Height/Weight:</strong> 6-2/230
<strong>Position:</strong> OF<br /><b>Drafted:</b> <a href="statss.aspx?playerid=10155&position=OF#draft" style="text-decoration:none;">2009 June Amateur Draft - Round: 1, Pick: 25, Overall: 25, Team: Los Angeles Angels</a><br />
<strong>Contract:</strong> <a href="statss.aspx?playerid=10155&position=OF#contract" style="text-decoration:none;">4.5M / 6 Years (2015 - 2020)</a>
</div>
如果有人感兴趣,请点击这里
http://www.fangraphs.com/statss.aspx?playerid=10155&position=OF
您不能为文本节点编写 css 选择器("OF" 是目标 div 元素中包含的第四个文本节点)。
所以你需要像这样以编程方式获取(需要 jsoup >= 1.6.2):
// select container div element
Elements position = doc.select("#content > table:nth-child(4) > tbody > tr > td:nth-child(1) > table > tbody > tr:nth-child(1) > td > div:nth-child(5)");
// extract the element from the list returned
Element element = ....
// TODO will need to check that the List exists and have at least four elements here
TextNode ofNode = element.textNodes().get(4);
ofNode.text(); // this will contain "OF"
HTML代码贴在最后,我要select"OF"元素。 这是 CSS select 或
Elements position = doc.select("#content > table:nth-child(4) > tbody > tr > td:nth-child(1) > table > tbody > tr:nth-child(1) > td > div:nth-child(5) > strong:nth-child(4)");
for (Element p : position) {
System.out.println(p);
}
这是输出
p returns "<strong>Position:</strong>"
p.text() returns "Position:"
来自 chrome 的 xpath 当我突出显示 "OF" 时:
//*@id="content"]/table[1]/tbody/tr/td[1]/table/tbody/tr[1]/td/div[3]/text()[4]
HTML代码
<div style="font-size: 10pt; padding-left:5px;">
<strong>Birthdate:</strong> 8/7/1991 (23 y, 6 m, 10 d)
<strong>Bats/Throws:</strong> R/R
<strong>Height/Weight:</strong> 6-2/230
<strong>Position:</strong> OF<br /><b>Drafted:</b> <a href="statss.aspx?playerid=10155&position=OF#draft" style="text-decoration:none;">2009 June Amateur Draft - Round: 1, Pick: 25, Overall: 25, Team: Los Angeles Angels</a><br />
<strong>Contract:</strong> <a href="statss.aspx?playerid=10155&position=OF#contract" style="text-decoration:none;">4.5M / 6 Years (2015 - 2020)</a>
</div>
<div style="font-size: 10pt; padding-left:5px;">
<strong>Birthdate:</strong> 8/7/1991 (23 y, 6 m, 10 d)
<strong>Bats/Throws:</strong> R/R
<strong>Height/Weight:</strong> 6-2/230
<strong>Position:</strong> OF<br /><b>Drafted:</b> <a href="statss.aspx?playerid=10155&position=OF#draft" style="text-decoration:none;">2009 June Amateur Draft - Round: 1, Pick: 25, Overall: 25, Team: Los Angeles Angels</a><br />
<strong>Contract:</strong> <a href="statss.aspx?playerid=10155&position=OF#contract" style="text-decoration:none;">4.5M / 6 Years (2015 - 2020)</a>
</div>
如果有人感兴趣,请点击这里 http://www.fangraphs.com/statss.aspx?playerid=10155&position=OF
您不能为文本节点编写 css 选择器("OF" 是目标 div 元素中包含的第四个文本节点)。 所以你需要像这样以编程方式获取(需要 jsoup >= 1.6.2):
// select container div element
Elements position = doc.select("#content > table:nth-child(4) > tbody > tr > td:nth-child(1) > table > tbody > tr:nth-child(1) > td > div:nth-child(5)");
// extract the element from the list returned
Element element = ....
// TODO will need to check that the List exists and have at least four elements here
TextNode ofNode = element.textNodes().get(4);
ofNode.text(); // this will contain "OF"