lxml xpath 无法处理 <p> 标签
lxml xpath can not handle <p> tag
如何在这种情况下获取 p 标签文本 "Blahblah":
当p标签文本字段在强标签后面时,lxml无法识别。
<p class="user_p"><strong>cc</strong>Blahblah</p>
====代码====
from lxml import html
content="""
<div>
<p class="user_p">Blahblah<strong>cc</strong></p>
<p class="user_p"><strong>cc</strong>Blahblah</p>
</div>
"""
tree = html.fromstring(content.decode('utf-8'))
p = tree.xpath('//div/p')
print p[0].text
print p[1].text
====输出====
Blahblah
None
在这个 HTML 片段中,
<p class="user_p"><strong>cc</strong>Blahblah</p>
文本"Blahblah"是<strong>
元素的tail
属性的值。
演示代码:
from lxml import html
content = """
<div>
<p class="user_p"><strong>cc</strong>Blahblah</p>
</div>"""
tree = html.fromstring(content)
s = tree.xpath('//div/p/strong')
print s[0].tail
输出:
Blahblah
如何在这种情况下获取 p 标签文本 "Blahblah":
当p标签文本字段在强标签后面时,lxml无法识别。
<p class="user_p"><strong>cc</strong>Blahblah</p>
====代码====
from lxml import html
content="""
<div>
<p class="user_p">Blahblah<strong>cc</strong></p>
<p class="user_p"><strong>cc</strong>Blahblah</p>
</div>
"""
tree = html.fromstring(content.decode('utf-8'))
p = tree.xpath('//div/p')
print p[0].text
print p[1].text
====输出====
Blahblah
None
在这个 HTML 片段中,
<p class="user_p"><strong>cc</strong>Blahblah</p>
文本"Blahblah"是<strong>
元素的tail
属性的值。
演示代码:
from lxml import html
content = """
<div>
<p class="user_p"><strong>cc</strong>Blahblah</p>
</div>"""
tree = html.fromstring(content)
s = tree.xpath('//div/p/strong')
print s[0].tail
输出:
Blahblah