如何在云上部署硒驱动的蜘蛛

Question

我使用 scrapyd 在我的本地机器上部署和调度我的蜘蛛。我现在面临的挑战是部署使用无头浏览器执行的爬虫。

我在 scrapyd 上的日志文件中收到两个错误，这些错误都与在项目目录中找不到 webdriver 有关

FileNotFoundError: [Errno 2] No such file or directory: './chromedriver'

selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH.

无头浏览器运行在云上吗？
我在部署我的项目时，chromedriver 会被删除吗？
有没有办法在 scrapyd 中查看我的项目文件，以确定该文件是否仍然存在于项目目录中？

下面是我的代码的副本

# I'm using SeleniumRequest for my requests so this is the configuration is my settings file 


chrome_path='./chromedriver'
SELENIUM_DRIVER_NAME = 'chrome' # Change to your browser name
SELENIUM_DRIVER_EXECUTABLE_PATH = chrome_path
SELENIUM_DRIVER_ARGUMENTS=['--headless']  # '--headless' if using chrome instead of firefox

FEED_EXPORT_ENCODING='utf-8'

这是我的蜘蛛代码

import scrapy
from scrapy_selenium import SeleniumRequest
from scrapy.selector import Selector
import time


class CovidngSpider(scrapy.Spider):
    name = 'covidng'
    #allowed_domains = ['covid19.ncdc.gov.ng']
    #start_urls = ['https://covid19.ncdc.gov.ng/']

def start_requests(self):
    yield SeleniumRequest(url ='https://covid19.ncdc.gov.ng/', wait_time = 3, screenshot = True, callback = self.parse)

def parse(self, response):



    driver = response.meta['driver']
    page_html = driver.page_source
    new_resp = Selector(text=page_html)

    databox = new_resp.xpath("//table[@id='custom3']/tbody/tr")

    for rows in databox:
        state = rows.xpath(".//td[1]/p/text()").get()
        total_cases = rows.xpath(".//td[2]/p/text()").get()
        active_cases = rows.xpath(".//td[3]/p/text()").get()
        discharged = rows.xpath(".//td[4]/p/text()").get()
        death = rows.xpath(".//td[5]/p/text()").get()

        yield {
            'State': state,
            'Total Cases': total_cases,
            'Active Cases': active_cases,
            'Discharged' : discharged,
            'Death': death
        }

Answer 1

首先：检查您是否已经安装了 chromedriver，因为它不是 Selenium 的一部分，并且您总是单独安装它。（如果使用 Firefox，则 geckodriver 也是如此）

其次：使用 /full/path/to/chromedriver - 系统可能运行代码在您期望的不同文件夹中，然后相对路径 ./chromedriver 可能指向您期望的 dirrefent 地方。

如何在云上部署硒驱动的蜘蛛

How to deploy selenium driven spiders on cloud

python

selenium

web-scraping

headless-browser

scrapyd

这是我的蜘蛛代码