Cheerio 不完整 Html

Cheerio Incomplete Html

我正在尝试使用 cheerio抓取 一个网站

const rp = require('request');
const cheerio = require('cheerio');


rp('https://www.fideyo.com/list',(error,response,html) =>
{
    if(!error && response.statusCode == 200)
    {
     const $ = cheerio.load(html); 
          
     console.log($.html());
    }    
});

但它 returns 不完整 html body 就像

<body>

<div id="app"></div>

<script type="text/javascript" src="https://cdn.fideyo.com/static/main.js?v=13"></script>

<!-- Google Tag Manager (noscript) -->
<noscript>
    <iframe src="https://www.googletagmanager.com/ns.html?id=GTM-KBGVCP3"
            height="0" width="0" style="display:none;visibility:hidden">
    </iframe>
</noscript>
<!-- End Google Tag Manager (noscript) -->

</body></html>

when i load site from chrome there is content in app section

如何访问应用部分中的内容?

如果我理解正确的话,app 部分的内容可能是由 JavaScript 动态创建的,而 cheerio 也不会 运行 JavaScript,只是解析硬编码的 HTML.

你需要类似 https://github.com/puppeteer/puppeteer/ 的东西:

import puppeteer from 'puppeteer';

const browser = await puppeteer.launch();

try {
  const [page] = await browser.pages();
  await page.goto('https://www.fideyo.com/list');

  await page.waitForSelector('div#app div#page-wrapper a');

  const html = await page.content();
  console.log(html);
} catch (err) { console.error(err); } finally { await browser.close(); }

文档在这里:https://github.com/puppeteer/puppeteer/blob/main/docs/api.md