如何使用 docker 设置 panther 爬虫
How to set up panther crawler with docker
Dockerfile
:
# https://github.com/symfony/panther#docker-integration
FROM php:latest
RUN apt-get update && apt-get install -y libzip-dev zlib1g-dev chromium && docker-php-ext-install zip
ENV PANTHER_NO_SANDBOX 1
RUN curl -sS https://getcomposer.org/installer | php -- --install-dir=/usr/local/bin --filename=composer
docker-compose.yml
:
version: "3"
services:
crawler:
build: .
working_dir: /usr/src
volumes:
- .:/usr/src
command: /bin/sh -c "/usr/local/bin/composer install && php index.php"
composer.json
:
{
"require": {
"symfony/panther": "^0.6.0"
}
}
index.php
:
<?php
// https://github.com/symfony/panther#basic-usage
require __DIR__.'/vendor/autoload.php'; // Composer's autoloader
$client = \Symfony\Component\Panther\Client::createChromeClient();
$client->request('GET', 'https://api-platform.com'); // Yes, this website is 100% written in JavaScript
$client->clickLink('Support');
// Wait for an element to be rendered
$crawler = $client->waitFor('.support');
echo $crawler->filter('.support')->text();
$client->takeScreenshot('screen.png'); // Yeah, screenshot!
所有文件都在同一位置。我 运行 docker-compose build && docker-compose up
并且出现以下错误:
crawler_1 | Fatal error: Uncaught RuntimeException: Could not start chrome (or it crashed) after 30 seconds. in /usr/src/vendor/symfony/panther/src/ProcessManager/WebServerReadinessProbeTrait.php:51
这与 https://github.com/symfony/panther/issues/200 类似,但在我的例子中,我没有使用 panther 进行测试,只是为了抓取,我真的不知道如何解决这个问题。我认为我的问题可能与无效的 docker / docker-compose 文件有关。
我有同样的错误。我的解决方案是 install unzip 正如 readme 中所说:
"Warning: On *nix systems, the unzip command must be installed or you
will encounter an error similar to RuntimeException: sh: 1: exec:
/app/vendor/symfony/panther/src/ProcessManager/../../chromedriver-bin/chromedriver_linux64:
Permission denied (or chromedriver_linux64: not found). The underlying
reason is that PHP's ZipArchive doesn't preserve UNIX executable
permissions."
最后,重新安装 panther 库。
Dockerfile
:
# https://github.com/symfony/panther#docker-integration
FROM php:latest
RUN apt-get update && apt-get install -y libzip-dev zlib1g-dev chromium && docker-php-ext-install zip
ENV PANTHER_NO_SANDBOX 1
RUN curl -sS https://getcomposer.org/installer | php -- --install-dir=/usr/local/bin --filename=composer
docker-compose.yml
:
version: "3"
services:
crawler:
build: .
working_dir: /usr/src
volumes:
- .:/usr/src
command: /bin/sh -c "/usr/local/bin/composer install && php index.php"
composer.json
:
{
"require": {
"symfony/panther": "^0.6.0"
}
}
index.php
:
<?php
// https://github.com/symfony/panther#basic-usage
require __DIR__.'/vendor/autoload.php'; // Composer's autoloader
$client = \Symfony\Component\Panther\Client::createChromeClient();
$client->request('GET', 'https://api-platform.com'); // Yes, this website is 100% written in JavaScript
$client->clickLink('Support');
// Wait for an element to be rendered
$crawler = $client->waitFor('.support');
echo $crawler->filter('.support')->text();
$client->takeScreenshot('screen.png'); // Yeah, screenshot!
所有文件都在同一位置。我 运行 docker-compose build && docker-compose up
并且出现以下错误:
crawler_1 | Fatal error: Uncaught RuntimeException: Could not start chrome (or it crashed) after 30 seconds. in /usr/src/vendor/symfony/panther/src/ProcessManager/WebServerReadinessProbeTrait.php:51
这与 https://github.com/symfony/panther/issues/200 类似,但在我的例子中,我没有使用 panther 进行测试,只是为了抓取,我真的不知道如何解决这个问题。我认为我的问题可能与无效的 docker / docker-compose 文件有关。
我有同样的错误。我的解决方案是 install unzip 正如 readme 中所说:
"Warning: On *nix systems, the unzip command must be installed or you will encounter an error similar to RuntimeException: sh: 1: exec: /app/vendor/symfony/panther/src/ProcessManager/../../chromedriver-bin/chromedriver_linux64: Permission denied (or chromedriver_linux64: not found). The underlying reason is that PHP's ZipArchive doesn't preserve UNIX executable permissions."
最后,重新安装 panther 库。