使用 Puppeteer 加载动态网页适用于本地主机,但不适用于 Heroku

Loading dynamic webpage with Puppeteer works on localhost but not Heroku

Node.js 带有 Express 的应用程序,部署在 Heroku 上。它只是动态网页。加载静态网页工作正常。

加载动态网页在本地主机上有效,但在 Heroku 上它会抛出 code=H12desc="Request timeout"service=30000msstatus=503.

此外,在执行 heroku restart 或部署后,似乎总是有一个 status=200 实例只加载动态网页的静态部分。

日志截图here


我尝试了以下方法,当部署在 Heroku 上(例如 Error R14 (Memory quota exceeded)code=H13 desc="Connection closed without response")时,它们都导致了相同或其他意外结果:


我观察到:


关于如何让它工作的任何想法?

const app = express();
const puppeteer = require("puppeteer");
let port = process.env.PORT || 3000;
let browser;

...

app.listen(port, async() => {
  browser = await puppeteer
    .launch({
      timeout: 0,
      headless: true,
      args: [
        "--no-sandbox",
        "--disable-setuid-sandbox",
        "--single-process",
        "--no-zygote",
      ],
    });
});

...

app.get("/appropriate-route-name", async (req, res) => {
  let url = req.query.url;
  let page = await browser.newPage();

  try {
    await page.goto(url, {
      waitUntil: "networkidle2",
    });
    res.send({ data: await page.content() });
  } catch (exception) {
    res.send({ data: null });
  } finally {
    await browser.close();
  }
}

能够通过使用 user-agents 使其正常工作。动态页面现在可以在 Heroku 上正常加载;请求不再每次都超时。

const app = express();
const puppeteer = require("puppeteer");
let port = process.env.PORT || 3000;
var userAgent = require("user-agents");

...

app.get("/route-name", async (req, res) => {
  let url = req.query.url;
  let browser = await puppeteer.launch({
    args: ["--no-sandbox"],
  });
  let page = await browser.newPage();

  try {
    await page.setUserAgent(userAgent.toString()); // added this
    await page.goto(url, {
      timeout: 30000,
      waitUntil: "newtorkidle2", // or "networkidle0", depending on what you need
    });
    res.send({ data: await page.content() });

  } catch (e) {
    res.send({ data: null });

  } finally {
    await browser.close();
  }
});