如何使用 docker 设置 panther 爬虫

How to set up panther crawler with docker

Dockerfile:

# https://github.com/symfony/panther#docker-integration
FROM php:latest

RUN apt-get update && apt-get install -y libzip-dev zlib1g-dev chromium && docker-php-ext-install zip
ENV PANTHER_NO_SANDBOX 1

RUN curl -sS https://getcomposer.org/installer | php -- --install-dir=/usr/local/bin --filename=composer

docker-compose.yml:

version: "3"
services:
  crawler:
    build: .
    working_dir: /usr/src
    volumes:
      - .:/usr/src
    command: /bin/sh -c "/usr/local/bin/composer install && php index.php"

composer.json:

{
    "require": {
        "symfony/panther": "^0.6.0"
    }
}

index.php:

<?php

// https://github.com/symfony/panther#basic-usage
require __DIR__.'/vendor/autoload.php'; // Composer's autoloader

$client = \Symfony\Component\Panther\Client::createChromeClient();
$client->request('GET', 'https://api-platform.com'); // Yes, this website is 100% written in JavaScript
$client->clickLink('Support');

// Wait for an element to be rendered
$crawler = $client->waitFor('.support');

echo $crawler->filter('.support')->text();
$client->takeScreenshot('screen.png'); // Yeah, screenshot!

所有文件都在同一位置。我 运行 docker-compose build && docker-compose up 并且出现以下错误: crawler_1 | Fatal error: Uncaught RuntimeException: Could not start chrome (or it crashed) after 30 seconds. in /usr/src/vendor/symfony/panther/src/ProcessManager/WebServerReadinessProbeTrait.php:51

这与 https://github.com/symfony/panther/issues/200 类似,但在我的例子中,我没有使用 panther 进行测试,只是为了抓取,我真的不知道如何解决这个问题。我认为我的问题可能与无效的 docker / docker-compose 文件有关。

我有同样的错误。我的解决方案是 install unzip 正如 readme 中所说:

"Warning: On *nix systems, the unzip command must be installed or you will encounter an error similar to RuntimeException: sh: 1: exec: /app/vendor/symfony/panther/src/ProcessManager/../../chromedriver-bin/chromedriver_linux64: Permission denied (or chromedriver_linux64: not found). The underlying reason is that PHP's ZipArchive doesn't preserve UNIX executable permissions."

最后,重新安装 panther 库。