网页抓取：自动化按钮点击

Question

我正在尝试使用 Scrapy 一个 python 框架从网站上抓取数据。我可以使用蜘蛛从网站获取数据，但是当我尝试浏览网站时出现问题。

根据 this post Scrapy 不能很好地处理 Javascript。

此外，如已接受的答案中所述，我不能使用 mechanize 或 lxml。它建议使用 Selenium and Scrapy.

的组合

按钮的功能：

I am browsing through offers on a website. The function of the button is to show more offers. SO on clicking it, it calls a javascript function which loads the results.

我也在看 CasperJS and PhantomJS。他们会工作吗？

我只需要自动点击一个按钮。我该怎么做？

Answer 1

首先，是的 - 你可以使用 PhantomJS ghostdriver with python. It is built-in to python-selenium:

pip install selenium

演示：

>>> from selenium import webdriver
>>> driver = webdriver.PhantomJS()
>>> driver.get('
>>> driver.title
u'javascript - Web scraping: Automating button click - Stack Overflow'

还有其他几个线程提供了 "scrapy+selenium" 蜘蛛的例子：

selenium with scrapy for dynamic page
Scraping with Scrapy and Selenium
seleniumcrawler

还有一个 scrapy-webdriver module 可能也能帮上忙。

将 scrapy 与 selenium 一起使用会给您带来巨大的开销，并且即使使用无头 PhantomJS 浏览器也会显着降低速度。

您很有可能可以通过模拟获取所需数据的基础请求来模仿 "show more offers" 按钮的点击。使用浏览器开发工具来探索触发了什么样的请求，并使用 scrapy.http.Request 在蜘蛛内部进行模拟。

网页抓取：自动化按钮点击

Web scraping: Automating button click

javascript

python

selenium

scrapy

web-scraping