Scrapy 飞溅不返回结果

Question

我正在学习 scrapy（使用 splash）并构建一个蜘蛛以从启用 js 的页面中抓取结果。我的蜘蛛工作并为 js 页面生成 return 结果。但是，它不return价格来自这个linkhttps://www.zara.com/us/en/bejewelled-appliqu%C3%A9-dress-p07854034.html?v1=4818592&v2=733885

xpath used: //*[contains(concat( " ", @class, " " ), concat( " ", "_product-price", " " ))]//span/text()

以上 xpath 在浏览器中产生 return 结果，但在通过 scrapy 调用时没有 return 结果。这是我的蜘蛛电话

yield scrapy.Request(url, callback=self.parse_page, dont_filter=True, meta={'splash': {'args': {'wait': 5,},'endpoint': 'render.html',}})

你能帮忙弄清楚为什么网站上的价格没有 returned 吗？

谢谢！

Answer 1

将此用于您的 xpath - //*[contains(concat( " ", @class, " " ), concat( " ", "_product-price", " " ))]//span/text() 或简单地 //*[contains(concat( " ", @class, " " )," _product-price " ))]//span/text()

Xpath @class= 谓词不适用于多个 classes（classes 由 space 分隔），就像您那里的那样。要获取元素，您应该使用 contains()

Answer 2

问题是价格在 Splash 呈现的 HTML 输出中根本不存在（最好将您的 URL 放在 Web 浏览器的 Splash 控制台中的 8050 端口并查看它的渲染输出）。通过 Docker 的 --disable-private-mode 启动选项或通过在 LUA 脚本中设置 splash.private_mode_enabled = false，从 Splash FAQ for when page is not rendered correctly. You will find out that in your case the solution is to disable Private mode 开始。禁用私有模式后，页面正确呈现。

Scrapy 飞溅不返回结果

Scrapy splash not returning results

javascript

python

scrapy

scrapy-splash