lxml xpath 无法处理 <p> 标签

lxml xpath can not handle <p> tag

如何在这种情况下获取 p 标签文本 "Blahblah":

当p标签文本字段在强标签后面时,lxml无法识别。

<p class="user_p"><strong>cc</strong>Blahblah</p>

====代码====

from lxml import html
content="""
    <div>
    <p class="user_p">Blahblah<strong>cc</strong></p>
    <p class="user_p"><strong>cc</strong>Blahblah</p> 
    </div>
"""
tree = html.fromstring(content.decode('utf-8'))

p = tree.xpath('//div/p')

print p[0].text

print p[1].text

====输出====

Blahblah
None

在这个 HTML 片段中,

<p class="user_p"><strong>cc</strong>Blahblah</p>

文本"Blahblah"是<strong>元素的tail属性的值。

演示代码:

from lxml import html

content = """
    <div>
     <p class="user_p"><strong>cc</strong>Blahblah</p> 
    </div>"""

tree = html.fromstring(content)
s = tree.xpath('//div/p/strong')
print s[0].tail

输出:

Blahblah