无法使用 Puppeteer 从 dolartoday.com 抓取输入值
Can't Scrape Input Value from dolartoday.com with Puppeteer
我想通过以下方式抓取元素 #result
的 value
:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://dolartoday.com');
await console.log(page.evaluate(() => document.getElementById('result')));
await browser.close();
})();
但它仍然记录以下错误:
(node:74908) UnhandledPromiseRejectionWarning: Error: Navigation Timeout Exceeded: 30000ms exceeded
at Promise.then (/Volumes/DATOS/Dropbox/workspaces/dolar-today/server/node_modules/puppeteer/lib/NavigatorWatcher.js:71:21)
at <anonymous>
(node:74908) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:74908) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
知道如何解决这个问题吗?
首先,您正在尝试使用 await
operator on console.log()
(a synchronous function), rather than on page.evaluate()
(异步函数)。
您还试图 return 页面 DOM 元素到 Node.js 环境,这将不起作用,因为 page.evaluate()
is expecting a serializable return 值。
如果你想return网页上#result
元素的value
,你应该重写你的逻辑如下:
console.log(await page.evaluate(() => document.getElementById('result').value));
此外,导航时间已经超过30000毫秒(默认最大值)。您可以使用 page.goto()
函数中的 timeout
选项来扩展最大导航时间:
await page.goto('https://dolartoday.com', {
timeout: 60000,
});
您还可以使用 page.setRequestInterception()
and page.on('request')
拒绝在网页中加载不必要的资源。这将使您的网页加载速度更快:
await page.setRequestInterception(true);
page.on('request', request => {
if (['image', 'stylesheet', 'font'].indexOf(request.resourceType()) !== -1) {
request.abort();
} else {
request.continue();
}
});
您的最终程序应如下所示:
'use strict';
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', request => {
if (['image', 'stylesheet', 'font'].indexOf(request.resourceType()) !== -1) {
request.abort();
} else {
request.continue();
}
});
await page.goto('https://dolartoday.com', {
timeout: 60000,
});
console.log(await page.evaluate(() => document.getElementById('result').value));
await browser.close();
})();
我想通过以下方式抓取元素 #result
的 value
:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://dolartoday.com');
await console.log(page.evaluate(() => document.getElementById('result')));
await browser.close();
})();
但它仍然记录以下错误:
(node:74908) UnhandledPromiseRejectionWarning: Error: Navigation Timeout Exceeded: 30000ms exceeded
at Promise.then (/Volumes/DATOS/Dropbox/workspaces/dolar-today/server/node_modules/puppeteer/lib/NavigatorWatcher.js:71:21)
at <anonymous>
(node:74908) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:74908) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
知道如何解决这个问题吗?
首先,您正在尝试使用 await
operator on console.log()
(a synchronous function), rather than on page.evaluate()
(异步函数)。
您还试图 return 页面 DOM 元素到 Node.js 环境,这将不起作用,因为 page.evaluate()
is expecting a serializable return 值。
如果你想return网页上#result
元素的value
,你应该重写你的逻辑如下:
console.log(await page.evaluate(() => document.getElementById('result').value));
此外,导航时间已经超过30000毫秒(默认最大值)。您可以使用 page.goto()
函数中的 timeout
选项来扩展最大导航时间:
await page.goto('https://dolartoday.com', {
timeout: 60000,
});
您还可以使用 page.setRequestInterception()
and page.on('request')
拒绝在网页中加载不必要的资源。这将使您的网页加载速度更快:
await page.setRequestInterception(true);
page.on('request', request => {
if (['image', 'stylesheet', 'font'].indexOf(request.resourceType()) !== -1) {
request.abort();
} else {
request.continue();
}
});
您的最终程序应如下所示:
'use strict';
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', request => {
if (['image', 'stylesheet', 'font'].indexOf(request.resourceType()) !== -1) {
request.abort();
} else {
request.continue();
}
});
await page.goto('https://dolartoday.com', {
timeout: 60000,
});
console.log(await page.evaluate(() => document.getElementById('result').value));
await browser.close();
})();