Puppeteer- 需要帮助从 h2 和 span 中提取文本

Puppeteer- Need help to extract the text from h2 and span

这里绝对是 JS 初学者。我需要帮助从 DOM 中提取如下所示的文本。 提取可以通过 querySelectorAll() 或 getElementsByTagName() 完成。但我正在寻找的是创建一个对象,每个 h2 元素作为键,跨度作为它的值。我不知道如何实现这一点。任何建议都会很有帮助。

<div class ="product-list">
  <div class="row  column">
    <div class="column medium-9 large-10">
         <h2 class="product-name">Products List 1</h2>
    </div>
  </div>
  <div class="row">
    <span>First Product</span>
  </div>
  <div class="row">
   <span> Second Product</span>
  </div>
  .
  .
  .
  <div class="row">
    <span>
    Nth Product
    </span>
  </div>
  <div class="row  column">
    <div class="column medium-9 large-10">
         <h2 class="product-name">Products List 2</h2>
    </div>
  </div>
  <div class="row">
    <span>Thrid Product</span>
  </div>
  <div class="row">
   <span> Fourth Product</span>
  </div>
  .
  .
  .
  <div class="row">
    <span>
    Nth Product
    </span>
  </div>
</div>

由此DOM我需要将数据存储为

[
Products List 1 :[First Product,Second Product...Nth Product],
Products List 2 :[Third Product,Fourth Product...Nth Product]
]

JS:

const products=await page.evaluate(()=>{
      const productsArray=[];
      
      var pdName1=document.querySelectorAll('div.column > h2.product-name');

      var pdName2=document.querySelectorAll("div.row > span")
      pdName2.forEach(query=>{
        productArray.push(query.innerText)
    })

      return productArray
  })
 

您可以尝试这样的操作:

import puppeteer from 'puppeteer';

const browser = await puppeteer.launch();

const html = `
  <!doctype html>
  <html>
    <head><meta charset='UTF-8'><title>Test</title></head>
    <body>
      <div class ="product-list">
        <div class="row  column">
          <div class="column medium-9 large-10">
               <h2 class="product-name">Products List 1</h2>
          </div>
        </div>
        <div class="row"><span>First Product</span></div>
        <div class="row"><span> Second Product</span></div>
        <div class="row"><span>Nth Product</span></div>
        <div class="row  column">
          <div class="column medium-9 large-10">
               <h2 class="product-name">Products List 2</h2>
          </div>
        </div>
        <div class="row"><span>Thrid Product</span></div>
        <div class="row"><span> Fourth Product</span></div>
        <div class="row"><span>Nth Product</span></div>
      </div>
    </body>
  </html>`;

try {
  const [page] = await browser.pages();

  await page.goto(`data:text/html,${html}`);

  const data = await page.evaluate(() => {
    const elements = document.querySelectorAll('h2, div.row span');
    const list = {};
    let currentKey = null;

    for (const element of elements) {
      if (element.tagName === 'H2') {
        currentKey = element.innerText;
        list[currentKey] = [];
      } else {
        list[currentKey].push(element.innerText);
      }
    }

    return list;
  });
  console.log(data);
} catch (err) { console.error(err); } finally { await browser.close(); }