如何获取该页面中每个广告的数据？

Question

我正在抓取此页面以获取每个广告的数据：http://www.cars2buy.co.uk/business-car-leasing/Abarth/695C/?

这是我在 scrapy 中的代码 shell：

scrapy shell "http://www.cars2buy.co.uk/business-car-leasing/Abarth/695C/"
for content in response.xpath('//*[@class="pitem"]/div[1]/div[2]/div[1]'):
          print content.xpath('//*[@class="detail"]/p/text()[2]').extract()

但它每次迭代只提取 48 个！！输出应该是：

48 months

48 months

48 months

36 months

48 months

48 months

48 months

48 months

48 months

36 months

根据页面上的广告！有什么建议吗？

Answer 1

轻松修复。尝试在第二个 xpath 的前面添加一个 .：

print content.xpath('.//*[@class="detail"]/p/text()[2]').extract()

解释：

以 / 开头的 xpath 表示 'start searching at the document root'，而以 . 开头的 xpath 表示 'start searching in the current position' ...文件系统。

因此，如果没有 .，您的 xpath 表达式会提取页面上任何位置的所有匹配元素……并在每次迭代中都这样做。

Update/Addition

当 xpath 表达式用于子元素（'selector' 在 scrapy 术语中）时也会发生这种情况，例如本例中的 content。

Scrapy内部保留了整个html，当xpath以/开头时，从文档根目录开始。在这里详细解释：https://doc.scrapy.org/en/latest/topics/selectors.html#working-with-relative-xpaths

如何获取该页面中每个广告的数据？

How to get the data for each ad in this page?

python

xpath

scrapy

scrapy-spider

scrapy-shell