Pyppeteer:浏览器在 AWS Lambda 中意外关闭
Pyppeteer: Browser closed unexpectedly in AWS Lambda
我 运行 遇到了 AWS Lambda 中的这个错误。看来 devtools websocket 没有启动。不知道如何解决它。有任何想法吗?谢谢你的时间。
异常源自 get_ws_endpoint()
由于 websocket 响应超时 https://github.com/pyppeteer/pyppeteer/blob/ad3a0a7da221a04425cbf0cc92e50e93883b077b/pyppeteer/launcher.py#L225
Lambda 代码:
import os
import json
import asyncio
import logging
import boto3
import pyppeteer
from pyppeteer import launch
logger = logging.getLogger()
logger.setLevel(logging.INFO)
pyppeteer.DEBUG = True # print suppressed errors as error log
def lambda_handler(event, context):
asyncio.get_event_loop().run_until_complete(main())
async def main():
browser = await launch({
'headless': True,
'args': [
'--no-sandbox'
]
})
page = await browser.newPage()
await page.goto('http://example.com')
await page.screenshot({'path': '/tmp/example.png'})
await browser.close()
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
异常:
Response:
{
"errorMessage": "Browser closed unexpectedly:\n",
"errorType": "BrowserError",
"stackTrace": [
" File \"/var/task/lambda_handler.py\", line 23, in lambda_handler\n asyncio.get_event_loop().run_until_complete(main())\n",
" File \"/var/lang/lib/python3.8/asyncio/base_events.py\", line 616, in run_until_complete\n return future.result()\n",
" File \"/var/task/lambda_handler.py\", line 72, in main\n browser = await launch({\n",
" File \"/opt/python/pyppeteer/launcher.py\", line 307, in launch\n return await Launcher(options, **kwargs).launch()\n",
" File \"/opt/python/pyppeteer/launcher.py\", line 168, in launch\n self.browserWSEndpoint = get_ws_endpoint(self.url)\n",
" File \"/opt/python/pyppeteer/launcher.py\", line 227, in get_ws_endpoint\n raise BrowserError('Browser closed unexpectedly:\n')\n"
]
}
Request ID:
"06be0620-8b5c-4600-a76e-bc785210244e"
Function Logs:
START RequestId: 06be0620-8b5c-4600-a76e-bc785210244e Version: $LATEST
---- files in /tmp ----
[W:pyppeteer.chromium_downloader] start chromium download.
Download may take a few minutes.
0%| | 0/108773488 [00:00<?, ?it/s]
11%|█▏ | 12267520/108773488 [00:00<00:00, 122665958.31it/s]
27%|██▋ | 29470720/108773488 [00:00<00:00, 134220418.14it/s]
42%|████▏ | 46172160/108773488 [00:00<00:00, 142570388.86it/s]
58%|█████▊ | 62607360/108773488 [00:00<00:00, 148471487.93it/s]
73%|███████▎ | 79626240/108773488 [00:00<00:00, 154371569.93it/s]
88%|████████▊ | 95754240/108773488 [00:00<00:00, 156353972.12it/s]
100%|██████████| 108773488/108773488 [00:00<00:00, 161750092.47it/s]
[W:pyppeteer.chromium_downloader]
chromium download done.
[W:pyppeteer.chromium_downloader] chromium extracted to: /tmp/local-chromium/588429
-----
/tmp/local-chromium/588429/chrome-linux/chrome
[ERROR] BrowserError: Browser closed unexpectedly:
Traceback (most recent call last):
File "/var/task/lambda_handler.py", line 23, in lambda_handler
asyncio.get_event_loop().run_until_complete(main())
File "/var/lang/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/var/task/lambda_handler.py", line 72, in main
browser = await launch({
File "/opt/python/pyppeteer/launcher.py", line 307, in launch
return await Launcher(options, **kwargs).launch()
File "/opt/python/pyppeteer/launcher.py", line 168, in launch
self.browserWSEndpoint = get_ws_endpoint(self.url)
File "/opt/python/pyppeteer/launcher.py", line 227, in get_ws_endpoint
raise BrowserError('Browser closed unexpectedly:\n')END RequestId: 06be0620-8b5c-4600-a76e-bc785210244e
REPORT RequestId: 06be0620-8b5c-4600-a76e-bc785210244e Duration: 33370.61 ms Billed Duration: 33400 ms Memory Size: 3008 MB Max Memory Used: 481 MB Init Duration: 445.58 ms
回答我自己的问题。
在我将 chromium 二进制文件捆绑到 lambda 层后,我终于能够 运行 Pyppeteer(v0.2.2) 与 Python 3.6 和 3.7(不是 3.8)。
所以总而言之,它似乎只有在配置为 运行 且用户提供的 chromium 可执行路径而不是自动下载 chrome 时才有效。可能是一些竞争条件或其他东西。
获取 Chromium
browser = await launch(
headless=True,
executablePath='/opt/python/headless-chromium',
args=[
'--no-sandbox',
'--single-process',
'--disable-dev-shm-usage',
'--disable-gpu',
'--no-zygote'
])
我认为 BrowserError: Browser closed unexpectedly
只是当 Chrome 因任何原因崩溃时出现的错误。如果 pyppeteer 打印出错误就好了,但它没有。
要跟踪事情,提取 pyppeteer 运行的确切命令会很有帮助。你可以这样做:
>>> from pyppeteer.launcher import Launcher
>>> ' '.join(Launcher().cmd)
/root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome --disable-background-networking --disable-background-timer-throttling --disable-breakpad --disable-browser-side-navigation --disable-client-side-phishing-detection --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=site-per-process --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost --disable-sync --disable-translate --metrics-recording-only --no-first-run --safebrowsing-disable-auto-update --enable-automation --password-store=basic --use-mock-keychain --headless --hide-scrollbars --mute-audio about:blank --no-sandbox --remote-debugging-port=33423 --user-data-dir=/root/.local/share/pyppeteer/.dev_profile/tmp5cj60q6q
当我在我的 Docker 图像中 运行 该命令时,出现以下错误:
$ /root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome # ...
/root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome:
error while loading shared libraries:
libnss3.so: cannot open shared object file: No such file or directory
所以我安装了libnss3
:
apt-get install -y libnss3
然后我再次运行命令并得到一个不同的错误:
$ /root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome # ...
[0609/190651.188666:ERROR:zygote_host_impl_linux.cc(89)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.
所以我需要将我的启动命令更改为:
browser = await launch(headless=True, args=['--no-sandbox'])
现在可以使用了!
我一直在尝试 运行 在 Docker 容器中进行 pyppeteer 并且 运行 进入同一问题。
感谢这条评论,终于设法修复了它:https://github.com/miyakogi/pyppeteer/issues/14#issuecomment-348825238
我通过 apt
手动安装了 Chrome
curl -sSL https://dl.google.com/linux/linux_signing_key.pub | apt-key add -
echo "deb [arch=amd64] https://dl.google.com/linux/chrome/deb/ stable main" > /etc/apt/sources.list.d/google-chrome.list
apt update -y && apt install -y google-chrome-stable
然后在启动浏览器时指定路径。
您还必须 运行 无头并使用参数“--no-sandbox”
browser = await launch(executablePath='/usr/bin/google-chrome-stable', headless=True, args=['--no-sandbox'])
希望对您有所帮助!
如果有人 运行 在 Heroku 上并且面临同样的错误:
添加 buildpack :buildpack 的 url 如下:
https://github.com/jontewks/puppeteer-heroku-buildpack
确保您使用的是 --no-sandbox
模式
launch({ args: ['--no-sandbox'] })
确保安装了所有必需的依赖项。您可以 运行 ldd /path/to/your/chrome | grep not
在 Linux 机器上检查缺少哪些依赖项。
就我而言,我明白了:
libatk-bridge-2.0.so.0 => not found
libgtk-3.so.0 => not found
然后安装依赖:
sudo apt-get install at-spi2-atk gtk3
现在可以使用了!
我 运行 遇到了 AWS Lambda 中的这个错误。看来 devtools websocket 没有启动。不知道如何解决它。有任何想法吗?谢谢你的时间。
异常源自 get_ws_endpoint()
由于 websocket 响应超时 https://github.com/pyppeteer/pyppeteer/blob/ad3a0a7da221a04425cbf0cc92e50e93883b077b/pyppeteer/launcher.py#L225
Lambda 代码:
import os
import json
import asyncio
import logging
import boto3
import pyppeteer
from pyppeteer import launch
logger = logging.getLogger()
logger.setLevel(logging.INFO)
pyppeteer.DEBUG = True # print suppressed errors as error log
def lambda_handler(event, context):
asyncio.get_event_loop().run_until_complete(main())
async def main():
browser = await launch({
'headless': True,
'args': [
'--no-sandbox'
]
})
page = await browser.newPage()
await page.goto('http://example.com')
await page.screenshot({'path': '/tmp/example.png'})
await browser.close()
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
异常:
Response:
{
"errorMessage": "Browser closed unexpectedly:\n",
"errorType": "BrowserError",
"stackTrace": [
" File \"/var/task/lambda_handler.py\", line 23, in lambda_handler\n asyncio.get_event_loop().run_until_complete(main())\n",
" File \"/var/lang/lib/python3.8/asyncio/base_events.py\", line 616, in run_until_complete\n return future.result()\n",
" File \"/var/task/lambda_handler.py\", line 72, in main\n browser = await launch({\n",
" File \"/opt/python/pyppeteer/launcher.py\", line 307, in launch\n return await Launcher(options, **kwargs).launch()\n",
" File \"/opt/python/pyppeteer/launcher.py\", line 168, in launch\n self.browserWSEndpoint = get_ws_endpoint(self.url)\n",
" File \"/opt/python/pyppeteer/launcher.py\", line 227, in get_ws_endpoint\n raise BrowserError('Browser closed unexpectedly:\n')\n"
]
}
Request ID:
"06be0620-8b5c-4600-a76e-bc785210244e"
Function Logs:
START RequestId: 06be0620-8b5c-4600-a76e-bc785210244e Version: $LATEST
---- files in /tmp ----
[W:pyppeteer.chromium_downloader] start chromium download.
Download may take a few minutes.
0%| | 0/108773488 [00:00<?, ?it/s]
11%|█▏ | 12267520/108773488 [00:00<00:00, 122665958.31it/s]
27%|██▋ | 29470720/108773488 [00:00<00:00, 134220418.14it/s]
42%|████▏ | 46172160/108773488 [00:00<00:00, 142570388.86it/s]
58%|█████▊ | 62607360/108773488 [00:00<00:00, 148471487.93it/s]
73%|███████▎ | 79626240/108773488 [00:00<00:00, 154371569.93it/s]
88%|████████▊ | 95754240/108773488 [00:00<00:00, 156353972.12it/s]
100%|██████████| 108773488/108773488 [00:00<00:00, 161750092.47it/s]
[W:pyppeteer.chromium_downloader]
chromium download done.
[W:pyppeteer.chromium_downloader] chromium extracted to: /tmp/local-chromium/588429
-----
/tmp/local-chromium/588429/chrome-linux/chrome
[ERROR] BrowserError: Browser closed unexpectedly:
Traceback (most recent call last):
File "/var/task/lambda_handler.py", line 23, in lambda_handler
asyncio.get_event_loop().run_until_complete(main())
File "/var/lang/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/var/task/lambda_handler.py", line 72, in main
browser = await launch({
File "/opt/python/pyppeteer/launcher.py", line 307, in launch
return await Launcher(options, **kwargs).launch()
File "/opt/python/pyppeteer/launcher.py", line 168, in launch
self.browserWSEndpoint = get_ws_endpoint(self.url)
File "/opt/python/pyppeteer/launcher.py", line 227, in get_ws_endpoint
raise BrowserError('Browser closed unexpectedly:\n')END RequestId: 06be0620-8b5c-4600-a76e-bc785210244e
REPORT RequestId: 06be0620-8b5c-4600-a76e-bc785210244e Duration: 33370.61 ms Billed Duration: 33400 ms Memory Size: 3008 MB Max Memory Used: 481 MB Init Duration: 445.58 ms
回答我自己的问题。
在我将 chromium 二进制文件捆绑到 lambda 层后,我终于能够 运行 Pyppeteer(v0.2.2) 与 Python 3.6 和 3.7(不是 3.8)。
所以总而言之,它似乎只有在配置为 运行 且用户提供的 chromium 可执行路径而不是自动下载 chrome 时才有效。可能是一些竞争条件或其他东西。
获取 Chromiumbrowser = await launch(
headless=True,
executablePath='/opt/python/headless-chromium',
args=[
'--no-sandbox',
'--single-process',
'--disable-dev-shm-usage',
'--disable-gpu',
'--no-zygote'
])
我认为 BrowserError: Browser closed unexpectedly
只是当 Chrome 因任何原因崩溃时出现的错误。如果 pyppeteer 打印出错误就好了,但它没有。
要跟踪事情,提取 pyppeteer 运行的确切命令会很有帮助。你可以这样做:
>>> from pyppeteer.launcher import Launcher
>>> ' '.join(Launcher().cmd)
/root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome --disable-background-networking --disable-background-timer-throttling --disable-breakpad --disable-browser-side-navigation --disable-client-side-phishing-detection --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=site-per-process --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost --disable-sync --disable-translate --metrics-recording-only --no-first-run --safebrowsing-disable-auto-update --enable-automation --password-store=basic --use-mock-keychain --headless --hide-scrollbars --mute-audio about:blank --no-sandbox --remote-debugging-port=33423 --user-data-dir=/root/.local/share/pyppeteer/.dev_profile/tmp5cj60q6q
当我在我的 Docker 图像中 运行 该命令时,出现以下错误:
$ /root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome # ...
/root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome:
error while loading shared libraries:
libnss3.so: cannot open shared object file: No such file or directory
所以我安装了libnss3
:
apt-get install -y libnss3
然后我再次运行命令并得到一个不同的错误:
$ /root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome # ...
[0609/190651.188666:ERROR:zygote_host_impl_linux.cc(89)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.
所以我需要将我的启动命令更改为:
browser = await launch(headless=True, args=['--no-sandbox'])
现在可以使用了!
我一直在尝试 运行 在 Docker 容器中进行 pyppeteer 并且 运行 进入同一问题。
感谢这条评论,终于设法修复了它:https://github.com/miyakogi/pyppeteer/issues/14#issuecomment-348825238
我通过 apt
手动安装了 Chromecurl -sSL https://dl.google.com/linux/linux_signing_key.pub | apt-key add -
echo "deb [arch=amd64] https://dl.google.com/linux/chrome/deb/ stable main" > /etc/apt/sources.list.d/google-chrome.list
apt update -y && apt install -y google-chrome-stable
然后在启动浏览器时指定路径。 您还必须 运行 无头并使用参数“--no-sandbox”
browser = await launch(executablePath='/usr/bin/google-chrome-stable', headless=True, args=['--no-sandbox'])
希望对您有所帮助!
如果有人 运行 在 Heroku 上并且面临同样的错误:
添加 buildpack :buildpack 的 url 如下:
https://github.com/jontewks/puppeteer-heroku-buildpack
确保您使用的是 --no-sandbox
模式
launch({ args: ['--no-sandbox'] })
确保安装了所有必需的依赖项。您可以 运行 ldd /path/to/your/chrome | grep not
在 Linux 机器上检查缺少哪些依赖项。
就我而言,我明白了:
libatk-bridge-2.0.so.0 => not found
libgtk-3.so.0 => not found
然后安装依赖:
sudo apt-get install at-spi2-atk gtk3
现在可以使用了!