在新选项卡中打开多个链接并使用木偶操纵者循环切换焦点?

Open multiple links in new tab and switch focus with a loop with puppeteer?

我在单个页面中有多个链接,我想按顺序或同时访问这些链接。我想要做的是在各自的新选项卡中打开所有链接,并将所有页面的页面获取为 pdf。我如何使用 puppeteer 实现同样的效果?

我可以通过 DOM 和 href 属性 获取所有链接,但我不知道如何在新标签页中打开它们并访问它们,然后关闭它们。

要打开一个新标签页(激活),您只需要做一个 call to page.bringToFront()

const page1 = await browser.newPage(); 
await page1.goto('https://www.google.com');

const page2 = await browser.newPage(); 
await page2.goto('https://www.bing.com');

const pageList = await browser.pages();    
console.log("NUMBER TABS:", pageList.length);

//switch tabs here
await page1.bringToFront();
//Do something... save as pdf
await page2.bringToFront();
//Do something... save as pdf

我怀疑您有一组页面,因此您可能需要调整上面的代码来满足这些需求。

至于从多个选项卡生成单个 pdf,我很确定这是不可能的。我怀疑会有一个节点库可以将多个 pdf 文件合并为一个。

pdf-merge might be what you are looking for.

您可以循环打开一个新页面:

const puppeteer = require('puppeteer');

(async () => {
  try {
    const browser = await puppeteer.launch();
    const urls = [
      'https://www.google.com',
      'https://www.duckduckgo.com',
      'https://www.bing.com',
    ];
    const pdfs = urls.map(async (url, i) => {
      const page = await browser.newPage();

      console.log(`loading page: ${url}`);
      await page.goto(url, {
        waitUntil: 'networkidle0',
        timeout: 120000,
      });

      console.log(`saving as pdf: ${url}`);
      await page.pdf({
        path: `${i}.pdf`,
        format: 'Letter',
        printBackground: true,
      });

      console.log(`closing page: ${url}`);
      await page.close();
    });

    Promise.all(pdfs).then(() => {
      browser.close();
    });
  } catch (error) {
    console.log(error);
  }
})();

您也可以使用 for 循环。

(async ()=>{
  const movieURL= ["https://www.imdb.com/title/tt0234215", "https://www.imdb.com/title/tt0411008"];
  for (var i = 0; i < movieURL.length; i++) {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto(movieURL[i], {waitUntil: "networkidle2"});
    const movieData = await page.evaluate(() => {
      let movieTitle = document.querySelector('div[class="TitleBlock"] > h1').innerText;
      return{movieTitle}
    });
    await browser.close();
    await console.log(movieData);
  }
})()