无法使用 Puppeteer 从 dolartoday.com 抓取输入值

Can't Scrape Input Value from dolartoday.com with Puppeteer

我想通过以下方式抓取元素 #resultvalue

 const puppeteer = require('puppeteer');

    (async () => {
      const browser = await puppeteer.launch();
      const page = await browser.newPage();
      await page.goto('https://dolartoday.com');
      await console.log(page.evaluate(() => document.getElementById('result')));

      await browser.close();
    })();

但它仍然记录以下错误:

(node:74908) UnhandledPromiseRejectionWarning: Error: Navigation Timeout Exceeded: 30000ms exceeded
at Promise.then (/Volumes/DATOS/Dropbox/workspaces/dolar-today/server/node_modules/puppeteer/lib/NavigatorWatcher.js:71:21)
at <anonymous>
(node:74908) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:74908) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

知道如何解决这个问题吗?

首先,您正在尝试使用 await operator on console.log() (a synchronous function), rather than on page.evaluate()(异步函数)。

您还试图 return 页面 DOM 元素到 Node.js 环境,这将不起作用,因为 page.evaluate() is expecting a serializable return 值。

如果你想return网页上#result元素的value,你应该重写你的逻辑如下:

console.log(await page.evaluate(() => document.getElementById('result').value));

此外,导航时间已经超过30000毫秒(默认最大值)。您可以使用 page.goto() 函数中的 timeout 选项来扩展最大导航时间:

await page.goto('https://dolartoday.com', {
  timeout: 60000,
});

您还可以使用 page.setRequestInterception() and page.on('request') 拒绝在网页中加载不必要的资源。这将使您的网页加载速度更快:

await page.setRequestInterception(true);

page.on('request', request => {
  if (['image', 'stylesheet', 'font'].indexOf(request.resourceType()) !== -1) {
    request.abort();
  } else {
    request.continue();
  }
});

您的最终程序应如下所示:

'use strict';

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.setRequestInterception(true);

  page.on('request', request => {
    if (['image', 'stylesheet', 'font'].indexOf(request.resourceType()) !== -1) {
      request.abort();
    } else {
      request.continue();
    }
  });

  await page.goto('https://dolartoday.com', {
    timeout: 60000,
  });

  console.log(await page.evaluate(() => document.getElementById('result').value));

  await browser.close();
})();