puppeteer wait for page/DOM updates - 响应初始加载后添加的新项目

puppeteer wait for page/DOM updates - respond to new items that are added after initial loading

我想使用 Puppeteer 来响应页面更新。 该页面显示项目,当我离开页面打开时,新项目会随着时间的推移出现。 例如。每 10 秒添加一个新项目。

我可以使用以下方法在页面初始加载时等待某个项目:

await page.waitFor(".item");
console.log("the initial items have been loaded")

如何等待/捕捉未来的物品? 我想实现这样的事情(伪代码):

await page.goto('http://mysite');
await page.waitFor(".item");
// check items (=these initial items)

// event when receiving new items:
// check item(s) (= the additional [or all] items)

您可以使用 exposeFunction 公开本地函数:

await page.exposeFunction('getItem', function(a) {
    console.log(a);
});

然后您可以使用 page.evaluate to create an observer 并监听在父节点内创建的新节点。

此示例抓取(这只是一个想法,不是最终作品)python chat in Stack Overflow,并打印在该聊天中创建的新项目。

var baseurl =  'https://chat.whosebug.com/rooms/6/python';
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto(baseurl);

await page.exposeFunction('getItem', function(a) {
    console.log(a);
});

await page.evaluate(() => {
    var observer = new MutationObserver((mutations) => { 
        for(var mutation of mutations) {
            if(mutation.addedNodes.length) {
                getItem(mutation.addedNodes[0].innerText);
            }
        }
    });
    observer.observe(document.getElementById("chat"), { attributes: false, childList: true, subtree: true });
});

作为 which injects a MutationObserver using evaluate which forwards the data to an exposed Node function, Puppeteer offers a higher-level function called page.waitForFunction that blocks on an arbitrary predicate and uses either a MutationObserver or requestAnimationFrame 的替代方法,用于确定何时重新评估谓词。

在循环中调用 page.waitForFunction 可能会增加开销,因为每个新调用都涉及注册新的观察者或 RAF。您必须针对您的用例进行概要分析——不过,这不是我过早担心的事情。

也就是说,RAF 选项可能会提供比 MO 更严格的延迟,但要花费一些额外的 CPU 周期来不断轮询。

这是以下站点上的一个最小示例,它提供定期更新的供稿:

const wait = ms => new Promise(r => setTimeout(r, ms));
const r = (lo, hi) => ~~(Math.random() * (hi - lo) + lo);

const randomString = n =>
  [...Array(n)].map(() => String.fromCharCode(r(97, 123))).join("")
;

(async () => {
  for (let i = 0; i < 500; i++) {
    const el = document.createElement("div");
    document.body.appendChild(el);
    el.innerText = randomString(r(5, 15));
    await wait(r(1000, 5000));
  }
})();

const puppeteer = require("puppeteer");

const html = `
<html><body><div class="container"></div><script>
const wait = ms => new Promise(r => setTimeout(r, ms));
const r = (lo, hi) => ~~(Math.random() * (hi - lo) + lo);
const randomString = n =>
  [...Array(n)].map(() => String.fromCharCode(r(97, 123))).join("")
;
(async () => {
  for (;;) {
    const el = document.createElement("div");
    document.querySelector(".container").appendChild(el);
    el.innerText = randomString(r(5, 15));
    await wait(r(1000, 5000));
  }
})();
</script></body></html>
`;
let browser;
(async () => {
  browser = await puppeteer.launch({headless: false});
  const [page] = await browser.pages();
  await page.setContent(html);
  
  for (;;) {
    await page.waitForFunction((el, oldLength) =>
      el.children.length > oldLength,                           // predicate
      {polling: "mutation" /* or: "raf" */, timeout: 0},        // wFF options
      await page.$(".container"),                               // elem to watch
      await page.$eval(".container", el => el.children.length), // oldLength
    );
    const selMostRecent = ".container div:last-child";
    console.log(await page.$eval(selMostRecent, el => el.textContent));
  }
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close())
;

另请参阅:

  • 显示了包装 page.waitForFunction.
  • 的通用 waitForTextChange 辅助函数
  • 这恰如其分地建议了在可能的情况下拦截 API 响应填充提要的替代方法。