Select 并修改特定文本后的xpath节点
Select and modify xpath nodes after specific text
我使用此代码获取所有名称:
def parse_authors(self, root):
author_nodes = root.xpath('//a[@class="booklink"][contains(@href,"/author/")]/text()')
if author_nodes:
return [unicode(author) for author in author_nodes]
但我想如果有翻译者在他们的名字旁边加上“(翻译)”:
example:translator1(translation)
您可以使用 translation:
文本节点 来区分作者和翻译者 - 作者是 "translation:" 文本节点的前面的兄弟姐妹,翻译者 - 后面的兄弟姐妹.
作者:
//text()[contains(., 'translation:')]/preceding-sibling::a[@class='booklink' and contains(@href, '/author/')]/text()
译员:
//text()[contains(., 'translation:')]/following-sibling::a[@class='booklink' and contains(@href, '/author/')]/text()
工作示例代码:
from lxml.html import fromstring
data = """
<td>
<a class="booklink" href="/author/43710/Author 1">Author 1</a>
,
<a class="booklink" href="/author/46907/Author 2">Author 2</a>
<br>
translation:
<a class="booklink" href="/author/47669/translator 1">Translator 1</a>
,
<a class="booklink" href="/author/9382/translator 2">Translator 2</a>
</td>"""
root = fromstring(data)
authors = root.xpath("//text()[contains(., 'translation:')]/preceding-sibling::a[@class='booklink' and contains(@href, '/author/')]/text()")
translators = root.xpath("//text()[contains(., 'translation:')]/following-sibling::a[@class='booklink' and contains(@href, '/author/')]/text()")
print(authors)
print(translators)
打印:
['Author 1', 'Author 2']
['Translator 1', 'Translator 2']
我使用此代码获取所有名称:
def parse_authors(self, root):
author_nodes = root.xpath('//a[@class="booklink"][contains(@href,"/author/")]/text()')
if author_nodes:
return [unicode(author) for author in author_nodes]
但我想如果有翻译者在他们的名字旁边加上“(翻译)”:
example:translator1(translation)
您可以使用 translation:
文本节点 来区分作者和翻译者 - 作者是 "translation:" 文本节点的前面的兄弟姐妹,翻译者 - 后面的兄弟姐妹.
作者:
//text()[contains(., 'translation:')]/preceding-sibling::a[@class='booklink' and contains(@href, '/author/')]/text()
译员:
//text()[contains(., 'translation:')]/following-sibling::a[@class='booklink' and contains(@href, '/author/')]/text()
工作示例代码:
from lxml.html import fromstring
data = """
<td>
<a class="booklink" href="/author/43710/Author 1">Author 1</a>
,
<a class="booklink" href="/author/46907/Author 2">Author 2</a>
<br>
translation:
<a class="booklink" href="/author/47669/translator 1">Translator 1</a>
,
<a class="booklink" href="/author/9382/translator 2">Translator 2</a>
</td>"""
root = fromstring(data)
authors = root.xpath("//text()[contains(., 'translation:')]/preceding-sibling::a[@class='booklink' and contains(@href, '/author/')]/text()")
translators = root.xpath("//text()[contains(., 'translation:')]/following-sibling::a[@class='booklink' and contains(@href, '/author/')]/text()")
print(authors)
print(translators)
打印:
['Author 1', 'Author 2']
['Translator 1', 'Translator 2']