如何通过 XPath select 跨越一个 div 而不是另一个?
How to select spans under one div but not another via XPath?
假设我有这个页面:
<div class="top">
<span class="strings">asdf</span>
<span class="strings">qwer</span>
<span class="strings">zxcv</span>
</div>
<div id="content">
some other text
<span class="strings">1234</span>
<span class="strings">5678</span>
<span class="strings">1234</span>
</div>
如何让脚本只抓取 div
id="content"
中的 span
class 字符串, 而不是 div
class="top"
?结果应为“1234”、“5678”、“1234”。
到目前为止,这是我的代码:
from lxml import html
import requests
url = 'http://www.amazon.com/dp/B00SGGQRNO'
response = requests.get(url)
tree = html.fromstring(response.content)
bullets = tree.xpath('//span[@class="strings"]/text()')
print ('Bullets: ',bullets)
仅 select 那些 span
元素(带有 @class="strings"
)的文本是带有 @id="content
的 div
元素的子元素,使用这个XPath 表达式:
//div[@id="content"]/span[@class="strings"]/text()
假设我有这个页面:
<div class="top">
<span class="strings">asdf</span>
<span class="strings">qwer</span>
<span class="strings">zxcv</span>
</div>
<div id="content">
some other text
<span class="strings">1234</span>
<span class="strings">5678</span>
<span class="strings">1234</span>
</div>
如何让脚本只抓取 div
id="content"
中的 span
class 字符串, 而不是 div
class="top"
?结果应为“1234”、“5678”、“1234”。
到目前为止,这是我的代码:
from lxml import html
import requests
url = 'http://www.amazon.com/dp/B00SGGQRNO'
response = requests.get(url)
tree = html.fromstring(response.content)
bullets = tree.xpath('//span[@class="strings"]/text()')
print ('Bullets: ',bullets)
仅 select 那些 span
元素(带有 @class="strings"
)的文本是带有 @id="content
的 div
元素的子元素,使用这个XPath 表达式:
//div[@id="content"]/span[@class="strings"]/text()