Puppeteer 与 Express Router Node JS 的并行性。如何在保持并发的情况下在路由之间传递页面

Parallelism of Puppeteer with Express Router Node JS. How to pass page between routes while maintaining concurrency

app.post('/api/auth/check', async (req, res) => {
try {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(
    'https://www.google.com'
  );
  res.json({message: 'Success'})
} catch (e) {
  console.log(e);
  res.status(500).json({ message: 'Error' });
}});

app.post('/api/auth/register', async (req, res) => {
  console.log('register');
  // Here i'm need to transfer the current user session (page and browser) and then perform actions on the same page.
  await page.waitForTimeout(1000);
  await browser.close();
}});

是否有可能以某种方式将页面和浏览器从一个路由转移到另一个路由,同时保持 puppeteer 并发。如果全局设置变量,那么页面和浏览器将被覆盖,多任务处理将无法工作。

一种方法是创建一个 returns 承诺将解析为相同页面和浏览器实例的闭包。由于 HTTP 是无状态的,我假设您有一些 session/authentication 管理系统将用户会话与 Puppeteer 浏览器实例相关联。

我已经稍微简化了您的路由并添加了一个简单的令牌管理系统以将用户与会话相关联,以便制作一个完整的、可运行的示例,但我认为您在适应它时不会遇到问题到您的用例。

const express = require("express");
const puppeteer = require("puppeteer");

//  
const asyncHandler = fn => (req, res, next) =>
  Promise.resolve(fn(req, res, next)).catch(next)
;
const startPuppeteerSession = async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  return {browser, page};
};
const sessions = {};

express()
  .use((req, res, next) => 
    req.query.token === undefined ? res.sendStatus(401) : next()
  )
  .get("/start", asyncHandler(async (req, res) => {
    sessions[req.query.token] = await startPuppeteerSession();
    res.sendStatus(200);
  }))
  .get("/navigate", asyncHandler(async (req, res) => {
    const page = await sessions[req.query.token].page;
    await page.goto(req.query.to || "http://www.example.com");
    res.sendStatus(200);
  }))
  .get("/content", asyncHandler(async (req, res) => {
    const page = await sessions[req.query.token].page;
    res.send(await page.content()); 
  }))
  .get("/kill", asyncHandler(async (req, res) => {
    const browser = await sessions[req.query.token].browser;
    await browser.close();
    delete sessions[req.query.token];
    res.sendStatus(200);
  }))
  .use((err, req, res, next) => res.sendStatus(500))
  .listen(8000, () => console.log("listening on port 8000"))
;

从客户角度看的示例用法:

$ curl localhost:8000/start?token=1
OK
$ curl 'localhost:8000/navigate?to=
OK
$ curl localhost:8000/content?token=1 | grep 'apsenT'
        <a href="/users/15547056/apsent">apsenT</a><span class="d-none" itemprop="name">apsenT</span>
            <a href="/users/15547056/apsent">apsenT</a> is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        <a href="/users/15547056/apsent">apsenT</a> is a new contributor. Be nice, and check out our <a href="/conduct">Code of Conduct</a>.
$ curl localhost:8000/kill?token=1
OK

您可以看到与令牌 1 关联的客户端已跨多个路由保持单个浏览器会话。其他客户端可以启动浏览器会话并同时对其进行操作。

重申一下,这只是跨路由共享 Puppeteer 浏览器实例的概念验证。使用上面的代码,用户可以向 start 路由发送垃圾邮件并创建浏览器,直到服务器崩溃,因此如果没有真正的身份验证和会话 management/error 处理,这完全不适合生产。

使用的包:express ^4.17.1, puppeteer ^8.0.0.