使用 Puppeteer 在循环中抓取嵌套的 span 标签
Scrape nested span tag on loop with Puppeteer
我有一个嵌套的 html 标签,我正试图从中抓取文本和 link。但由于奇怪的原因,它不起作用。
我用</code>这个表情符号标记了我想抓取的行。一个是 Link,另一个是文本。</p>
<pre><code><div class="q-box" role="list" style="box-sizing: border-box;">
<div>
<a class="q-box qu-display--block qu-cursor--pointer qu-hover--textDecoration--none Link___StyledBox-t2xg9c-0 KlcoI" target="_blank" href="https://www.quora.com/Can-Facebook-see-who-viewed-your-profile" style="box-sizing: border-box; border-radius: inherit;">
<div class="q-box qu-hover--textDecoration--underline qu-tapHighlight--none qu-display--flex qu-alignItems--center" style="box-sizing: border-box; position: relative;">
<div class="q-flex qu-alignItems--center qu-py--tiny qu-flex--auto qu-overflow--hidden" style="box-sizing: border-box; display: flex;">
<div class="q-box qu-flex--auto qu-overflow--hidden" style="box-sizing: border-box;">
<div class="q-text qu-color--gray_dark" style="box-sizing: border-box;">
<div class="q-box qu-py--tiny" style="box-sizing: border-box;">
<span class="q-text qu-color--blue_dark" style="box-sizing: border-box;">
<div class="q-flex qu-flexDirection--row" style="box-sizing: border-box; display: flex;">
<div class="q-inline qu-flexWrap--wrap" style="box-sizing: border-box; display: inline; max-width: 100%;">
<div class="q-text qu-truncateLines--2 puppeteer_test_question_title" style="box-sizing: border-box;">
<span class="q-box qu-userSelect--text" style="box-sizing: border-box;">
<span style="background: none;">Can Facebook see who viewed your profile?</span></span></div>
</div>
</div>
</span>
</div>
</div>
</div>
</div>
</div>
</a>
</div>
<div>
<a class="q-box qu-display--block qu-cursor--pointer qu-hover--textDecoration--none Link___StyledBox-t2xg9c-0 KlcoI" target="_blank" href="https://onlinesocialmediasolution.quora.com/How-to-view-a-private-Facebook-profile" style="box-sizing: border-box; border-radius: inherit;">
<div class="q-box qu-hover--textDecoration--underline qu-tapHighlight--none qu-display--flex qu-alignItems--center" style="box-sizing: border-box; position: relative;">
<div class="q-flex qu-alignItems--center qu-py--tiny qu-flex--auto qu-overflow--hidden" style="box-sizing: border-box; display: flex;">
<div class="q-box qu-flex--auto qu-overflow--hidden" style="box-sizing: border-box;">
<div class="q-text qu-color--gray_dark" style="box-sizing: border-box;">
<div class="q-box qu-py--tiny" style="box-sizing: border-box;">
<span class="q-text qu-color--blue_dark" style="box-sizing: border-box;">
<div class="q-flex qu-flexDirection--row" style="box-sizing: border-box; display: flex;">
<div class="q-inline qu-flexWrap--wrap" style="box-sizing: border-box; display: inline; max-width: 100%;">
<div class="q-text qu-truncateLines--2 puppeteer_test_question_title" style="box-sizing: border-box;">
<span class="q-box qu-userSelect--text" style="box-sizing: border-box;">
<span style="background: none;">How do you view a private Facebook profile?</span></span></div>
</div>
</div>
</span>
</div>
</div>
</div>
</div>
</div>
</a>
</div>
<div>
<a class="q-box qu-display--block qu-cursor--pointer qu-hover--textDecoration--none Link___StyledBox-t2xg9c-0 KlcoI" target="_blank" href="https://www.quora.com/How-can-you-tell-if-non-friends-have-viewed-your-Facebook-profile" style="box-sizing: border-box; border-radius: inherit;">
<div class="q-box qu-hover--textDecoration--underline qu-tapHighlight--none qu-display--flex qu-alignItems--center" style="box-sizing: border-box; position: relative;">
<div class="q-flex qu-alignItems--center qu-py--tiny qu-flex--auto qu-overflow--hidden" style="box-sizing: border-box; display: flex;">
<div class="q-box qu-flex--auto qu-overflow--hidden" style="box-sizing: border-box;">
<div class="q-text qu-color--gray_dark" style="box-sizing: border-box;">
<div class="q-box qu-py--tiny" style="box-sizing: border-box;">
<span class="q-text qu-color--blue_dark" style="box-sizing: border-box;">
<div class="q-flex qu-flexDirection--row" style="box-sizing: border-box; display: flex;">
<div class="q-inline qu-flexWrap--wrap" style="box-sizing: border-box; display: inline; max-width: 100%;">
<div class="q-text qu-truncateLines--2 puppeteer_test_question_title" style="box-sizing: border-box;">
<span class="q-box qu-userSelect--text" style="box-sizing: border-box;">
<span style="background: none;">How can you tell if non-friends have viewed your Facebook profile?</span></span></div>
</div>
</div>
</span>
</div>
</div>
</div>
</div>
</div>
</a>
</div>
<div>
<a class="q-box qu-display--block qu-cursor--pointer qu-hover--textDecoration--none Link___StyledBox-t2xg9c-0 KlcoI" target="_blank" href="https://www.quora.com/Is-there-a-way-to-see-your-own-Facebook-profile-from-the-view-of-a-non-friend" style="box-sizing: border-box; border-radius: inherit;">
<div class="q-box qu-hover--textDecoration--underline qu-tapHighlight--none qu-display--flex qu-alignItems--center" style="box-sizing: border-box; position: relative;">
<div class="q-flex qu-alignItems--center qu-py--tiny qu-flex--auto qu-overflow--hidden" style="box-sizing: border-box; display: flex;">
<div class="q-box qu-flex--auto qu-overflow--hidden" style="box-sizing: border-box;">
<div class="q-text qu-color--gray_dark" style="box-sizing: border-box;">
<div class="q-box qu-py--tiny" style="box-sizing: border-box;">
<span class="q-text qu-color--blue_dark" style="box-sizing: border-box;">
<div class="q-flex qu-flexDirection--row" style="box-sizing: border-box; display: flex;">
<div class="q-inline qu-flexWrap--wrap" style="box-sizing: border-box; display: inline; max-width: 100%;">
<div class="q-text qu-truncateLines--2 puppeteer_test_question_title" style="box-sizing: border-box;">
<span class="q-box qu-userSelect--text" style="box-sizing: border-box;">
<span style="background: none;">Is there a way to see your own Facebook profile from the view of a non-friend?</span></span></div>
</div>
</div>
</span>
</div>
</div>
</div>
</div>
</div>
</a>
</div>
</div>
这是我目前所做的 Index.js 文件代码。但它循环遍历所有标记的表情符号行。也不行。
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle2' });
try {
// loop through the selector and get the data
await page.waitForSelector(
'#root > div.q-box > div > div > div:nth-child(4) > div > div > div:nth-child(2) > div > div'
);
const related = page.$eval(
'#root > div.q-box > div > div > div:nth-child(4) > div > div > div:nth-child(2) > div > div > div.q-box.qu-mb--large > div > div:nth-child(2)',
(el) => el.innerText
);
res.send(related);
} catch (err) {
// res.send(err, 500);
console.log(err);
}
await browser.close();
根据您在评论中提供的 Quora URL,我检索了容器框的 CSS class 即 .q-sticky
。它有助于更轻松地找到内部元素(links 和 link 文本)。
使用 child combinator and universal selector 您可以编写可以获取所需元素的模式:
- 所有 link 框内:
'.q-sticky * > a'
- 框中所有 link 条文本:
'.q-sticky * > .q-box.qu-userSelect--text'
注意:您的初始代码有一个异步问题:const related = page.$eval(...
您应该 await
page.$eval
以避免错误(木偶操纵者的方法主要是 return 承诺,即可以通过等待他们来处理)。
您可以使用其 page.$$eval
变体(即“querySelectorAll”版本)代替 page.$eval
,它可以 return 具有相同选择器的元素数组。
最后,您可以根据需要组合两个数组(我在下面使用了 Array.map
oneliner)
await page.waitForSelector('.q-sticky * > a');
const relatedLinks = await page.$$eval('.q-sticky * > a', elems => elems.map((el) => el.href));
const relatedTitles = await page.$$eval('.q-sticky * > .q-box.qu-userSelect--text', elems => elems.map((el) => el.innerText));
const related = relatedLinks.map((linkel, i) => { return { link: linkel, title: relatedTitles[i] }});
console.log(related);
我有一个嵌套的 html 标签,我正试图从中抓取文本和 link。但由于奇怪的原因,它不起作用。
我用</code>这个表情符号标记了我想抓取的行。一个是 Link,另一个是文本。</p>
<pre><code><div class="q-box" role="list" style="box-sizing: border-box;">
<div>
<a class="q-box qu-display--block qu-cursor--pointer qu-hover--textDecoration--none Link___StyledBox-t2xg9c-0 KlcoI" target="_blank" href="https://www.quora.com/Can-Facebook-see-who-viewed-your-profile" style="box-sizing: border-box; border-radius: inherit;">
<div class="q-box qu-hover--textDecoration--underline qu-tapHighlight--none qu-display--flex qu-alignItems--center" style="box-sizing: border-box; position: relative;">
<div class="q-flex qu-alignItems--center qu-py--tiny qu-flex--auto qu-overflow--hidden" style="box-sizing: border-box; display: flex;">
<div class="q-box qu-flex--auto qu-overflow--hidden" style="box-sizing: border-box;">
<div class="q-text qu-color--gray_dark" style="box-sizing: border-box;">
<div class="q-box qu-py--tiny" style="box-sizing: border-box;">
<span class="q-text qu-color--blue_dark" style="box-sizing: border-box;">
<div class="q-flex qu-flexDirection--row" style="box-sizing: border-box; display: flex;">
<div class="q-inline qu-flexWrap--wrap" style="box-sizing: border-box; display: inline; max-width: 100%;">
<div class="q-text qu-truncateLines--2 puppeteer_test_question_title" style="box-sizing: border-box;">
<span class="q-box qu-userSelect--text" style="box-sizing: border-box;">
<span style="background: none;">Can Facebook see who viewed your profile?</span></span></div>
</div>
</div>
</span>
</div>
</div>
</div>
</div>
</div>
</a>
</div>
<div>
<a class="q-box qu-display--block qu-cursor--pointer qu-hover--textDecoration--none Link___StyledBox-t2xg9c-0 KlcoI" target="_blank" href="https://onlinesocialmediasolution.quora.com/How-to-view-a-private-Facebook-profile" style="box-sizing: border-box; border-radius: inherit;">
<div class="q-box qu-hover--textDecoration--underline qu-tapHighlight--none qu-display--flex qu-alignItems--center" style="box-sizing: border-box; position: relative;">
<div class="q-flex qu-alignItems--center qu-py--tiny qu-flex--auto qu-overflow--hidden" style="box-sizing: border-box; display: flex;">
<div class="q-box qu-flex--auto qu-overflow--hidden" style="box-sizing: border-box;">
<div class="q-text qu-color--gray_dark" style="box-sizing: border-box;">
<div class="q-box qu-py--tiny" style="box-sizing: border-box;">
<span class="q-text qu-color--blue_dark" style="box-sizing: border-box;">
<div class="q-flex qu-flexDirection--row" style="box-sizing: border-box; display: flex;">
<div class="q-inline qu-flexWrap--wrap" style="box-sizing: border-box; display: inline; max-width: 100%;">
<div class="q-text qu-truncateLines--2 puppeteer_test_question_title" style="box-sizing: border-box;">
<span class="q-box qu-userSelect--text" style="box-sizing: border-box;">
<span style="background: none;">How do you view a private Facebook profile?</span></span></div>
</div>
</div>
</span>
</div>
</div>
</div>
</div>
</div>
</a>
</div>
<div>
<a class="q-box qu-display--block qu-cursor--pointer qu-hover--textDecoration--none Link___StyledBox-t2xg9c-0 KlcoI" target="_blank" href="https://www.quora.com/How-can-you-tell-if-non-friends-have-viewed-your-Facebook-profile" style="box-sizing: border-box; border-radius: inherit;">
<div class="q-box qu-hover--textDecoration--underline qu-tapHighlight--none qu-display--flex qu-alignItems--center" style="box-sizing: border-box; position: relative;">
<div class="q-flex qu-alignItems--center qu-py--tiny qu-flex--auto qu-overflow--hidden" style="box-sizing: border-box; display: flex;">
<div class="q-box qu-flex--auto qu-overflow--hidden" style="box-sizing: border-box;">
<div class="q-text qu-color--gray_dark" style="box-sizing: border-box;">
<div class="q-box qu-py--tiny" style="box-sizing: border-box;">
<span class="q-text qu-color--blue_dark" style="box-sizing: border-box;">
<div class="q-flex qu-flexDirection--row" style="box-sizing: border-box; display: flex;">
<div class="q-inline qu-flexWrap--wrap" style="box-sizing: border-box; display: inline; max-width: 100%;">
<div class="q-text qu-truncateLines--2 puppeteer_test_question_title" style="box-sizing: border-box;">
<span class="q-box qu-userSelect--text" style="box-sizing: border-box;">
<span style="background: none;">How can you tell if non-friends have viewed your Facebook profile?</span></span></div>
</div>
</div>
</span>
</div>
</div>
</div>
</div>
</div>
</a>
</div>
<div>
<a class="q-box qu-display--block qu-cursor--pointer qu-hover--textDecoration--none Link___StyledBox-t2xg9c-0 KlcoI" target="_blank" href="https://www.quora.com/Is-there-a-way-to-see-your-own-Facebook-profile-from-the-view-of-a-non-friend" style="box-sizing: border-box; border-radius: inherit;">
<div class="q-box qu-hover--textDecoration--underline qu-tapHighlight--none qu-display--flex qu-alignItems--center" style="box-sizing: border-box; position: relative;">
<div class="q-flex qu-alignItems--center qu-py--tiny qu-flex--auto qu-overflow--hidden" style="box-sizing: border-box; display: flex;">
<div class="q-box qu-flex--auto qu-overflow--hidden" style="box-sizing: border-box;">
<div class="q-text qu-color--gray_dark" style="box-sizing: border-box;">
<div class="q-box qu-py--tiny" style="box-sizing: border-box;">
<span class="q-text qu-color--blue_dark" style="box-sizing: border-box;">
<div class="q-flex qu-flexDirection--row" style="box-sizing: border-box; display: flex;">
<div class="q-inline qu-flexWrap--wrap" style="box-sizing: border-box; display: inline; max-width: 100%;">
<div class="q-text qu-truncateLines--2 puppeteer_test_question_title" style="box-sizing: border-box;">
<span class="q-box qu-userSelect--text" style="box-sizing: border-box;">
<span style="background: none;">Is there a way to see your own Facebook profile from the view of a non-friend?</span></span></div>
</div>
</div>
</span>
</div>
</div>
</div>
</div>
</div>
</a>
</div>
</div>
这是我目前所做的 Index.js 文件代码。但它循环遍历所有标记的表情符号行。也不行。
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle2' });
try {
// loop through the selector and get the data
await page.waitForSelector(
'#root > div.q-box > div > div > div:nth-child(4) > div > div > div:nth-child(2) > div > div'
);
const related = page.$eval(
'#root > div.q-box > div > div > div:nth-child(4) > div > div > div:nth-child(2) > div > div > div.q-box.qu-mb--large > div > div:nth-child(2)',
(el) => el.innerText
);
res.send(related);
} catch (err) {
// res.send(err, 500);
console.log(err);
}
await browser.close();
根据您在评论中提供的 Quora URL,我检索了容器框的 CSS class 即 .q-sticky
。它有助于更轻松地找到内部元素(links 和 link 文本)。
使用 child combinator and universal selector 您可以编写可以获取所需元素的模式:
- 所有 link 框内:
'.q-sticky * > a'
- 框中所有 link 条文本:
'.q-sticky * > .q-box.qu-userSelect--text'
注意:您的初始代码有一个异步问题:const related = page.$eval(...
您应该 await
page.$eval
以避免错误(木偶操纵者的方法主要是 return 承诺,即可以通过等待他们来处理)。
您可以使用其 page.$$eval
变体(即“querySelectorAll”版本)代替 page.$eval
,它可以 return 具有相同选择器的元素数组。
最后,您可以根据需要组合两个数组(我在下面使用了 Array.map
oneliner)
await page.waitForSelector('.q-sticky * > a');
const relatedLinks = await page.$$eval('.q-sticky * > a', elems => elems.map((el) => el.href));
const relatedTitles = await page.$$eval('.q-sticky * > .q-box.qu-userSelect--text', elems => elems.map((el) => el.innerText));
const related = relatedLinks.map((linkel, i) => { return { link: linkel, title: relatedTitles[i] }});
console.log(related);