jQuery 只返回第一个 .attr('href');
jQuery Only Returning First .attr('href');
我正在尝试使用节点和 cheerio 抓取网页。除 href 外,一切都如我所料 returning。
我已成功 returning 'headers' .find('h3').text()
和 'descriptions' .find('a').text()
的值,但 'links' .find('a').attr('href');
只有第一个正在 returned。这让我感到困惑,因为文本 'descriptions' 在同一个锚点内。
我发现,如果我删除 .attr('href');
而只删除 return .find('a')
,那么 link 文本 (href) 会按预期显示。我可以修改 returned 值并在需要时使它工作,但我更愿意正确地做到这一点。
脚本:
const cheerio = require("cheerio");
const axios = require("axios");
axios.get("http://localhost:8000/sample_page_2.html").then(urlResponse => {
const $ = cheerio.load(urlResponse.data);
$('div.tos-post-type').each((i, element) => {
const header = $(element)
.find('h3')
.text()
.trim();
console.log('------------------------------------------------------------------------------------');
console.log('HEADER: ' + header);
const link = $(element)
.find('a')
.attr('href');
console.log('\nLINK(s): \n' + link);
const description = $(element)
.find('a')
.text();
console.log('\nDESCRIPTION(s): \n' + description + '\n');
console.log('------------------------------------------------------------------------------------');
});
});
这是我要抓取的网页的片段:
<div class="container tos-archive">
<div class="row justify-content-center">
<div class="col-lg-10">
<div class="row">
<div class="col-lg-6">
<div class="tos-post-type" style="background-image: url(https://www.example_domain.com/wp-content/hero-pattern.png)">
<div class="icon" style="background-image: url( https://www.example_domain.com/wp-content/legal.svg )"></div>
<h3>
Legal </h3>
<a href="https://www.example_domain.com/legal/terms-conditions/">
Terms & Conditions </a>
<a href="https://www.example_domain.com/legal/service-providers/">
Service Providers </a>
</div>
</div>
<div class="col-lg-6">
<div class="tos-post-type" style="background-image: url(https://www.example_domain.com/wp-content/hero-pattern.png)">
<div class="icon" style="background-image: url( https://www.example_domain.com/wp-content/policy.svg )"></div>
<h3>
Policies </h3>
<a target="" href="https://www.example_domain.com/privacy-policy/">
Privacy Policy </a>
<a target="" href="https://store.example_domain.com/EXHM/store?Action=DisplayEXCookiesPolicyPage">
Cookie Policy </a>
</div>
</div>
<div class="col-lg-6">
<div class="tos-post-type" style="background-image: url(https://www.example_domain.com/wp-content/hero-pattern.png)">
<div class="icon" style="background-image: url( https://www.example_domain.com/wp-content/clip-dark.svg )"></div>
<h3>
<a href="https://www.example_domain.com/compliance/">
Compliance </a>
</h3>
<a href="https://www.example_domain.com/compliance/ccpa/">
California Consumer Privacy Act (CCPA) </a>
<a href="https://www.example_domain.com/compliance/disaster-recovery/">
Disaster Recovery </a>
<a href="https://www.example_domain.com/compliance/gdpr/">
GDPR </a>
<a href="https://www.example_domain.com/compliance/pci-dss/">
PCI DSS </a>
<a href="https://www.example_domain.com/compliance/privacymark/">
PrivacyMark </a>
<a class="tos-view-all" href="https://www.example_domain.com/compliance/">
View All </a>
</div>
</div>
<div class="col-lg-6">
<div class="tos-post-type" style="background-image: url(https://www.example_domain.com/wp-content/hero-pattern.png)">
<div class="icon" style="background-image: url( https://www.example_domain.com/wp-content/mouse.svg )"></div>
<h3>
Other </h3>
<a href="https://www.example_domain.com/legal-other/eu-standard-solutions/">
EU Standard Solutions </a>
<a href="https://www.example_domain.com/legal-other/eu-standard-service-providers/">
EU Standard Service Providers </a>
<a href="https://www.example_domain.com/legal-other/data-exhibit/">
Data Exhibit </a>
<a href="https://www.example_domain.com/legal-other/data-standards/">
Data Standards </a>
<a href="https://www.example_domain.com/legal-other/payment-addenda/">
Payment Addenda </a>
</div>
</div>
</div>
</div>
</div>
</div>
这是实际结果的片段:
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
HEADER: Policies
LINK(s):
https://www.example_domain.com/privacy-policy/
DESCRIPTION(s):
Privacy Policy
Cookie Policy
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
HEADER: Compliance
LINK(s):
https://www.example_domain.com/compliance/
DESCRIPTION(s):
Compliance
California Consumer Privacy Act (CCPA)
Disaster Recovery
GDPR
PCI DSS
PrivacyMark
View All
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
这是我所期待的(多个 links):
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
HEADER: Policies
LINK(s):
https://www.example_domain.com/privacy-policy/
https://store.example_domain.com/EXHM/store?Action=DisplayEXCookiesPolicyPage
DESCRIPTION(s):
Privacy Policy
Cookie Policy
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
HEADER: Compliance
LINK(s):
https://www.example_domain.com/compliance/
https://www.example_domain.com/compliance/ccpa/
https://www.example_domain.com/compliance/disaster-recovery/
https://www.example_domain.com/compliance/gdpr/
https://www.example_domain.com/compliance/pci-dss/
https://www.example_domain.com/compliance/privacymark/
https://www.example_domain.com/compliance/
DESCRIPTION(s):
Compliance
California Consumer Privacy Act (CCPA)
Disaster Recovery
GDPR
PCI DSS
PrivacyMark
View All
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
知道我做错了什么吗?
谢谢!
使用map获取多个属性:
$(element).find('a').get().map(a => $(a).attr('href'))
我正在尝试使用节点和 cheerio 抓取网页。除 href 外,一切都如我所料 returning。
我已成功 returning 'headers' .find('h3').text()
和 'descriptions' .find('a').text()
的值,但 'links' .find('a').attr('href');
只有第一个正在 returned。这让我感到困惑,因为文本 'descriptions' 在同一个锚点内。
我发现,如果我删除 .attr('href');
而只删除 return .find('a')
,那么 link 文本 (href) 会按预期显示。我可以修改 returned 值并在需要时使它工作,但我更愿意正确地做到这一点。
脚本:
const cheerio = require("cheerio");
const axios = require("axios");
axios.get("http://localhost:8000/sample_page_2.html").then(urlResponse => {
const $ = cheerio.load(urlResponse.data);
$('div.tos-post-type').each((i, element) => {
const header = $(element)
.find('h3')
.text()
.trim();
console.log('------------------------------------------------------------------------------------');
console.log('HEADER: ' + header);
const link = $(element)
.find('a')
.attr('href');
console.log('\nLINK(s): \n' + link);
const description = $(element)
.find('a')
.text();
console.log('\nDESCRIPTION(s): \n' + description + '\n');
console.log('------------------------------------------------------------------------------------');
});
});
这是我要抓取的网页的片段:
<div class="container tos-archive">
<div class="row justify-content-center">
<div class="col-lg-10">
<div class="row">
<div class="col-lg-6">
<div class="tos-post-type" style="background-image: url(https://www.example_domain.com/wp-content/hero-pattern.png)">
<div class="icon" style="background-image: url( https://www.example_domain.com/wp-content/legal.svg )"></div>
<h3>
Legal </h3>
<a href="https://www.example_domain.com/legal/terms-conditions/">
Terms & Conditions </a>
<a href="https://www.example_domain.com/legal/service-providers/">
Service Providers </a>
</div>
</div>
<div class="col-lg-6">
<div class="tos-post-type" style="background-image: url(https://www.example_domain.com/wp-content/hero-pattern.png)">
<div class="icon" style="background-image: url( https://www.example_domain.com/wp-content/policy.svg )"></div>
<h3>
Policies </h3>
<a target="" href="https://www.example_domain.com/privacy-policy/">
Privacy Policy </a>
<a target="" href="https://store.example_domain.com/EXHM/store?Action=DisplayEXCookiesPolicyPage">
Cookie Policy </a>
</div>
</div>
<div class="col-lg-6">
<div class="tos-post-type" style="background-image: url(https://www.example_domain.com/wp-content/hero-pattern.png)">
<div class="icon" style="background-image: url( https://www.example_domain.com/wp-content/clip-dark.svg )"></div>
<h3>
<a href="https://www.example_domain.com/compliance/">
Compliance </a>
</h3>
<a href="https://www.example_domain.com/compliance/ccpa/">
California Consumer Privacy Act (CCPA) </a>
<a href="https://www.example_domain.com/compliance/disaster-recovery/">
Disaster Recovery </a>
<a href="https://www.example_domain.com/compliance/gdpr/">
GDPR </a>
<a href="https://www.example_domain.com/compliance/pci-dss/">
PCI DSS </a>
<a href="https://www.example_domain.com/compliance/privacymark/">
PrivacyMark </a>
<a class="tos-view-all" href="https://www.example_domain.com/compliance/">
View All </a>
</div>
</div>
<div class="col-lg-6">
<div class="tos-post-type" style="background-image: url(https://www.example_domain.com/wp-content/hero-pattern.png)">
<div class="icon" style="background-image: url( https://www.example_domain.com/wp-content/mouse.svg )"></div>
<h3>
Other </h3>
<a href="https://www.example_domain.com/legal-other/eu-standard-solutions/">
EU Standard Solutions </a>
<a href="https://www.example_domain.com/legal-other/eu-standard-service-providers/">
EU Standard Service Providers </a>
<a href="https://www.example_domain.com/legal-other/data-exhibit/">
Data Exhibit </a>
<a href="https://www.example_domain.com/legal-other/data-standards/">
Data Standards </a>
<a href="https://www.example_domain.com/legal-other/payment-addenda/">
Payment Addenda </a>
</div>
</div>
</div>
</div>
</div>
</div>
这是实际结果的片段:
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
HEADER: Policies
LINK(s):
https://www.example_domain.com/privacy-policy/
DESCRIPTION(s):
Privacy Policy
Cookie Policy
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
HEADER: Compliance
LINK(s):
https://www.example_domain.com/compliance/
DESCRIPTION(s):
Compliance
California Consumer Privacy Act (CCPA)
Disaster Recovery
GDPR
PCI DSS
PrivacyMark
View All
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
这是我所期待的(多个 links):
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
HEADER: Policies
LINK(s):
https://www.example_domain.com/privacy-policy/
https://store.example_domain.com/EXHM/store?Action=DisplayEXCookiesPolicyPage
DESCRIPTION(s):
Privacy Policy
Cookie Policy
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
HEADER: Compliance
LINK(s):
https://www.example_domain.com/compliance/
https://www.example_domain.com/compliance/ccpa/
https://www.example_domain.com/compliance/disaster-recovery/
https://www.example_domain.com/compliance/gdpr/
https://www.example_domain.com/compliance/pci-dss/
https://www.example_domain.com/compliance/privacymark/
https://www.example_domain.com/compliance/
DESCRIPTION(s):
Compliance
California Consumer Privacy Act (CCPA)
Disaster Recovery
GDPR
PCI DSS
PrivacyMark
View All
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
知道我做错了什么吗?
谢谢!
使用map获取多个属性:
$(element).find('a').get().map(a => $(a).attr('href'))