Python Lxml 在 <strong></strong> 标签后查找文本

Question

请问你能找到解决这个简单问题的方法吗？

<strong>text1</strong>: text2

我正在尝试抓取这个 html 部分，所以我需要分别获取 text1 和 text2。怎么做？它应该是这样的：

x = tree.xpath('//strong[text()="text1"]/text()')

但是这段代码 returns 实际 "text1" ，我也需要 text2..

Answer 1

您需要获取strong标签元素，然后使用element.tail获取其后的文本。示例 -

In [12]: from lxml import html

In [13]: tree = html.fromstring("<strong>text1</strong>: text2 ")

In [14]: x = tree.xpath('//strong[text()="text1"]')

In [15]: for i in x:
   ....:     print(i.tail)
   ....:
: text2

这也适用于 lxml.etree ，而不仅仅是 lxml.html 。示例 -

In [16]: from lxml import etree

In [18]: tree = etree.fromstring("<elem><strong>text1</strong>: text2</elem>")

In [19]: x = tree.xpath('//strong[text()="text1"]')

In [20]: for i in x:
   ....:     print(i.tail)
   ....:
: text2

要将它们放在一起，您可以这样做 -

In [21]: x = tree.xpath('//strong[text()="text1"]')

In [23]: for i in x:
   ....:     print('text :',i.text)
   ....:     print('tail :',i.tail)
   ....:
text : text1
tail : : text2

Python Lxml 在 <strong></strong> 标签后查找文本

Python Lxml find text after <strong></strong> tags

python

lxml