使用 Requests-HTML 仅从父标签中提取文本
Extract text only from the parent tag with Requests-HTML
我只想使用 Requests-HTML 从父标签中提取文本。
如果我们有 html 这样的
<td>
<a href="">There</a> <a href="">are</a> <a href="">some</a> <a href="">links.</a> The text that we are looking for.
<td>
然后
html.find('td', first=True).text
结果
>>> There are some links. The text that we are looking for.
你可以使用xpath
表达式,库直接支持
from requests_html import HTML
doc = """<td>
<a href="">There</a> <a href="">are</a> <a href="">some</a> <a href="">links/</a> The text that we are looking for.
<td>"""
html = HTML(html=doc)
# the list will contain all the whitespaces "between" <a> tags
text_list = html.xpath('//td/text()')
# join the list and strip the whitespaces
print(''.join(text_list).strip()) # The text that we are looking for.
表达式//td/text()
将select所有td
节点及其文本根文本内容(//td//text()
将select所有文本内容)。
我只想使用 Requests-HTML 从父标签中提取文本。 如果我们有 html 这样的
<td>
<a href="">There</a> <a href="">are</a> <a href="">some</a> <a href="">links.</a> The text that we are looking for.
<td>
然后
html.find('td', first=True).text
结果
>>> There are some links. The text that we are looking for.
你可以使用xpath
表达式,库直接支持
from requests_html import HTML
doc = """<td>
<a href="">There</a> <a href="">are</a> <a href="">some</a> <a href="">links/</a> The text that we are looking for.
<td>"""
html = HTML(html=doc)
# the list will contain all the whitespaces "between" <a> tags
text_list = html.xpath('//td/text()')
# join the list and strip the whitespaces
print(''.join(text_list).strip()) # The text that we are looking for.
表达式//td/text()
将select所有td
节点及其文本根文本内容(//td//text()
将select所有文本内容)。