Xpath - 如何 select 关联表弟数据
Xpath - How to select related cousin data
<html>
<table border="1">
<tbody>
<tr>
<td>
<table border="1">
<tbody>
<tr>
<th>aaa</th>
<th>bbb</th>
<th>ccc</th>
<th>ddd</th>
<th>eee</th>
<th>fff</th>
</tr>
<tr>
<td>111</td>
<td>222</td>
<td>333</td>
<td>444</td>
<td>555</td>
<td>666</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</html>
如何使用 xpath select 特定的相关表亲数据,所需的输出将是:
<th>aaa</th>
<th>ccc</th>
<th>fff</th>
<td>111</td>
<td>333</th>
<td>666</td>
xpath 最重要的方面是我希望能够包含或排除某些 <th>
标签及其相应的 <td>
标签
所以根据目前为止我最接近的答案是:
//th[not(contains(text(), "ddd"))] | //tr[2]/td[not(position()=4)]
有什么方法可以不显式使用 position()=4
而是引用相应的 th
标签
我不确定这是最好的解决方案,但您可以试试
//th[not(.="bbb") and not(.="ddd") and not(.="eee")] | //tr[2]/td[not(position()=index-of(//th, "bbb")) and not(position()=index-of(//th, "ddd")) and not(position()=index-of(//th, "eee"))]
或更短的版本
//th[not(.=("bbb", "ddd", "eee"))]| //tr[2]/td[not(position()=(index-of(//th, "bbb"), index-of(//th, "ddd"),index-of(//th, "eee")))]
<th>aaa</th>
<th>ccc</th>
<th>fff</th>
<td>111</td>
<td>333</td>
<td>666</td>
您可以避免使用复杂的 XPath
表达式来获得所需的输出。尝试使用 Python
+ Selenium
功能:
# Get list of th elements
th_elements = driver.find_elements_by_xpath('//th')
# Get list of td elements
td_elements = driver.find_elements_by_xpath('//tr[2]/td')
# Get indexes of required th elements - [0, 2, 5]
ok_index = [th_elements.index(i) for i in th_elements if i.text not in ('bbb', 'ddd', 'eee')]
for i in ok_index:
print(th_elements[i].text)
for i in ok_index:
print(td_elements[i].text)
输出为
'aaa'
'ccc'
'fff'
'111'
'333'
'666'
如果您需要XPath 1.0
解决方案:
//th[not(.=("bbb", "ddd", "eee"))]| //tr[2]/td[not(position()=(count(//th[.="bbb"]/preceding-sibling::th)+1, count(//th[.="ddd"]/preceding-sibling::th)+1, count(//th[.="eee"]/preceding-sibling::th)+1))]
使用 XPath 3.0,您可以将其结构化为
let $th := //table/tbody/tr[1]/th,
$filteredTh := $th[not(. = ("bbb", "ddd", "eee"))],
$pos := $filteredTh!index-of($th, .)
return ($filteredTh, //table/tbody/tr[position() gt 1]/td[position() = $pos])
<html>
<table border="1">
<tbody>
<tr>
<td>
<table border="1">
<tbody>
<tr>
<th>aaa</th>
<th>bbb</th>
<th>ccc</th>
<th>ddd</th>
<th>eee</th>
<th>fff</th>
</tr>
<tr>
<td>111</td>
<td>222</td>
<td>333</td>
<td>444</td>
<td>555</td>
<td>666</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</html>
如何使用 xpath select 特定的相关表亲数据,所需的输出将是:
<th>aaa</th>
<th>ccc</th>
<th>fff</th>
<td>111</td>
<td>333</th>
<td>666</td>
xpath 最重要的方面是我希望能够包含或排除某些 <th>
标签及其相应的 <td>
标签
所以根据目前为止我最接近的答案是:
//th[not(contains(text(), "ddd"))] | //tr[2]/td[not(position()=4)]
有什么方法可以不显式使用 position()=4
而是引用相应的 th
标签
我不确定这是最好的解决方案,但您可以试试
//th[not(.="bbb") and not(.="ddd") and not(.="eee")] | //tr[2]/td[not(position()=index-of(//th, "bbb")) and not(position()=index-of(//th, "ddd")) and not(position()=index-of(//th, "eee"))]
或更短的版本
//th[not(.=("bbb", "ddd", "eee"))]| //tr[2]/td[not(position()=(index-of(//th, "bbb"), index-of(//th, "ddd"),index-of(//th, "eee")))]
<th>aaa</th>
<th>ccc</th>
<th>fff</th>
<td>111</td>
<td>333</td>
<td>666</td>
您可以避免使用复杂的 XPath
表达式来获得所需的输出。尝试使用 Python
+ Selenium
功能:
# Get list of th elements
th_elements = driver.find_elements_by_xpath('//th')
# Get list of td elements
td_elements = driver.find_elements_by_xpath('//tr[2]/td')
# Get indexes of required th elements - [0, 2, 5]
ok_index = [th_elements.index(i) for i in th_elements if i.text not in ('bbb', 'ddd', 'eee')]
for i in ok_index:
print(th_elements[i].text)
for i in ok_index:
print(td_elements[i].text)
输出为
'aaa'
'ccc'
'fff'
'111'
'333'
'666'
如果您需要XPath 1.0
解决方案:
//th[not(.=("bbb", "ddd", "eee"))]| //tr[2]/td[not(position()=(count(//th[.="bbb"]/preceding-sibling::th)+1, count(//th[.="ddd"]/preceding-sibling::th)+1, count(//th[.="eee"]/preceding-sibling::th)+1))]
使用 XPath 3.0,您可以将其结构化为
let $th := //table/tbody/tr[1]/th,
$filteredTh := $th[not(. = ("bbb", "ddd", "eee"))],
$pos := $filteredTh!index-of($th, .)
return ($filteredTh, //table/tbody/tr[position() gt 1]/td[position() = $pos])