如何使用Python和lxml进入xpath中的特定节点
How to enter a specific node in xpath with Python and lxml
想像以下HTML
<div class="group>
<ul class="smallList">
<li><strong>Date</strong>
some Date
</li>
<li>
<strong>Author</strong>
some Name
</li>
<li>
<strong>Keywords</strong>
<a href="linka"
rel="nofollow">keyworda</a>,
<a href="linkb"
rel="nofollow">Keywordb</a>,
</li>
<li>
<strong>Print</strong>
<a class="icon print" rel="nofollow" href="javascript:window.print()">print page</a>
</li>
</ul>
</div>
<div class="group>
<ul class="smallList">
<li><a href="linkc">Linktext</a></li>
</ul>
<div>
我正在寻找 keyworda 和 keywordb。因此只有包含关键字
的 lsistelement 中的单词
我可以使用
获取所有节点
.//div[@class='group']/ul[@class='smallList']/li/a/node()
但是我如何只输入特定的那个?
我假设您想使用 Xpath 获取关键字条目。 contains function can help here. I'll use the parsel 库,仅仅是因为它易于使用 IMO。这也可以使用 lxml 或 Python 中的其他库进行复制。
data = "[your html above here]"
from parsel import Selector
sel = Selector(data)
#the path looks for the hyperlink and checks for two conditions:
#1. href contains link AND
#2. rel contains nofollow.
#after that access the text for this path
path = ".//a[contains(@href,'link') and contains(@rel,'nofollow')]/text()"
#extract text using getall() :
print(sel.xpath(path).getall())
['keyworda', 'Keywordb']
想像以下HTML
<div class="group>
<ul class="smallList">
<li><strong>Date</strong>
some Date
</li>
<li>
<strong>Author</strong>
some Name
</li>
<li>
<strong>Keywords</strong>
<a href="linka"
rel="nofollow">keyworda</a>,
<a href="linkb"
rel="nofollow">Keywordb</a>,
</li>
<li>
<strong>Print</strong>
<a class="icon print" rel="nofollow" href="javascript:window.print()">print page</a>
</li>
</ul>
</div>
<div class="group>
<ul class="smallList">
<li><a href="linkc">Linktext</a></li>
</ul>
<div>
我正在寻找 keyworda 和 keywordb。因此只有包含关键字
的 lsistelement 中的单词我可以使用
获取所有节点.//div[@class='group']/ul[@class='smallList']/li/a/node()
但是我如何只输入特定的那个?
我假设您想使用 Xpath 获取关键字条目。 contains function can help here. I'll use the parsel 库,仅仅是因为它易于使用 IMO。这也可以使用 lxml 或 Python 中的其他库进行复制。
data = "[your html above here]"
from parsel import Selector
sel = Selector(data)
#the path looks for the hyperlink and checks for two conditions:
#1. href contains link AND
#2. rel contains nofollow.
#after that access the text for this path
path = ".//a[contains(@href,'link') and contains(@rel,'nofollow')]/text()"
#extract text using getall() :
print(sel.xpath(path).getall())
['keyworda', 'Keywordb']