Cheerio 不完整 Html
Cheerio Incomplete Html
我正在尝试使用 cheerio抓取 一个网站
const rp = require('request');
const cheerio = require('cheerio');
rp('https://www.fideyo.com/list',(error,response,html) =>
{
if(!error && response.statusCode == 200)
{
const $ = cheerio.load(html);
console.log($.html());
}
});
但它 returns 不完整 html body 就像
<body>
<div id="app"></div>
<script type="text/javascript" src="https://cdn.fideyo.com/static/main.js?v=13"></script>
<!-- Google Tag Manager (noscript) -->
<noscript>
<iframe src="https://www.googletagmanager.com/ns.html?id=GTM-KBGVCP3"
height="0" width="0" style="display:none;visibility:hidden">
</iframe>
</noscript>
<!-- End Google Tag Manager (noscript) -->
</body></html>
when i load site from chrome there is content in app section
如何访问应用部分中的内容?
如果我理解正确的话,app 部分的内容可能是由 JavaScript 动态创建的,而 cheerio
也不会 运行 JavaScript,只是解析硬编码的 HTML.
你需要类似 https://github.com/puppeteer/puppeteer/ 的东西:
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch();
try {
const [page] = await browser.pages();
await page.goto('https://www.fideyo.com/list');
await page.waitForSelector('div#app div#page-wrapper a');
const html = await page.content();
console.log(html);
} catch (err) { console.error(err); } finally { await browser.close(); }
文档在这里:https://github.com/puppeteer/puppeteer/blob/main/docs/api.md
我正在尝试使用 cheerio抓取 一个网站
const rp = require('request');
const cheerio = require('cheerio');
rp('https://www.fideyo.com/list',(error,response,html) =>
{
if(!error && response.statusCode == 200)
{
const $ = cheerio.load(html);
console.log($.html());
}
});
但它 returns 不完整 html body 就像
<body>
<div id="app"></div>
<script type="text/javascript" src="https://cdn.fideyo.com/static/main.js?v=13"></script>
<!-- Google Tag Manager (noscript) -->
<noscript>
<iframe src="https://www.googletagmanager.com/ns.html?id=GTM-KBGVCP3"
height="0" width="0" style="display:none;visibility:hidden">
</iframe>
</noscript>
<!-- End Google Tag Manager (noscript) -->
</body></html>
when i load site from chrome there is content in app section
如何访问应用部分中的内容?
如果我理解正确的话,app 部分的内容可能是由 JavaScript 动态创建的,而 cheerio
也不会 运行 JavaScript,只是解析硬编码的 HTML.
你需要类似 https://github.com/puppeteer/puppeteer/ 的东西:
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch();
try {
const [page] = await browser.pages();
await page.goto('https://www.fideyo.com/list');
await page.waitForSelector('div#app div#page-wrapper a');
const html = await page.content();
console.log(html);
} catch (err) { console.error(err); } finally { await browser.close(); }
文档在这里:https://github.com/puppeteer/puppeteer/blob/main/docs/api.md