jQuery 只返回第一个 .attr('href');

jQuery Only Returning First .attr('href');

我正在尝试使用节点和 cheerio 抓取网页。除 href 外,一切都如我所料 returning。

我已成功 returning 'headers' .find('h3').text() 和 'descriptions' .find('a').text() 的值,但 'links' .find('a').attr('href');只有第一个正在 returned。这让我感到困惑,因为文本 'descriptions' 在同一个锚点内。

我发现,如果我删除 .attr('href'); 而只删除 return .find('a'),那么 link 文本 (href) 会按预期显示。我可以修改 returned 值并在需要时使它工作,但我更愿意正确地做到这一点。

脚本:

const cheerio = require("cheerio");
const axios = require("axios");

axios.get("http://localhost:8000/sample_page_2.html").then(urlResponse => {
    const $ = cheerio.load(urlResponse.data);

    $('div.tos-post-type').each((i, element) => {

        const header = $(element)
            .find('h3')
            .text()
            .trim();
        console.log('------------------------------------------------------------------------------------');
        console.log('HEADER: ' + header);

        const link = $(element)
            .find('a')
            .attr('href');

        console.log('\nLINK(s): \n' + link);

        const description = $(element)
            .find('a')
            .text();

        console.log('\nDESCRIPTION(s): \n' + description + '\n');
        console.log('------------------------------------------------------------------------------------');
    });
});

这是我要抓取的网页的片段:

<div class="container tos-archive">
    <div class="row justify-content-center">
        <div class="col-lg-10">
            <div class="row">
                <div class="col-lg-6">
                    <div class="tos-post-type" style="background-image: url(https://www.example_domain.com/wp-content/hero-pattern.png)">
                        <div class="icon" style="background-image: url( https://www.example_domain.com/wp-content/legal.svg )"></div>
                        <h3>
                            Legal </h3>
                        <a href="https://www.example_domain.com/legal/terms-conditions/">
                            Terms &amp; Conditions </a>
                        <a href="https://www.example_domain.com/legal/service-providers/">
                            Service Providers </a>
                    </div>
                </div>
                <div class="col-lg-6">
                    <div class="tos-post-type" style="background-image: url(https://www.example_domain.com/wp-content/hero-pattern.png)">
                        <div class="icon" style="background-image: url( https://www.example_domain.com/wp-content/policy.svg )"></div>
                        <h3>
                            Policies </h3>
                        <a target="" href="https://www.example_domain.com/privacy-policy/">
                            Privacy Policy </a>
                        <a target="" href="https://store.example_domain.com/EXHM/store?Action=DisplayEXCookiesPolicyPage">
                            Cookie Policy </a>
                    </div>
                </div>
                <div class="col-lg-6">
                    <div class="tos-post-type" style="background-image: url(https://www.example_domain.com/wp-content/hero-pattern.png)">
                        <div class="icon" style="background-image: url( https://www.example_domain.com/wp-content/clip-dark.svg )"></div>
                        <h3>
                            <a href="https://www.example_domain.com/compliance/">
                                Compliance </a>
                        </h3>
                        <a href="https://www.example_domain.com/compliance/ccpa/">
                            California Consumer Privacy Act (CCPA) </a>
                        <a href="https://www.example_domain.com/compliance/disaster-recovery/">
                            Disaster Recovery </a>
                        <a href="https://www.example_domain.com/compliance/gdpr/">
                            GDPR </a>
                        <a href="https://www.example_domain.com/compliance/pci-dss/">
                            PCI DSS </a>
                        <a href="https://www.example_domain.com/compliance/privacymark/">
                            PrivacyMark </a>
                        <a class="tos-view-all" href="https://www.example_domain.com/compliance/">
                            View All </a>
                    </div>
                </div>
                <div class="col-lg-6">
                    <div class="tos-post-type" style="background-image: url(https://www.example_domain.com/wp-content/hero-pattern.png)">
                        <div class="icon" style="background-image: url( https://www.example_domain.com/wp-content/mouse.svg )"></div>
                        <h3>
                            Other </h3>
                        <a href="https://www.example_domain.com/legal-other/eu-standard-solutions/">
                            EU Standard Solutions </a>
                        <a href="https://www.example_domain.com/legal-other/eu-standard-service-providers/">
                            EU Standard Service Providers </a>
                        <a href="https://www.example_domain.com/legal-other/data-exhibit/">
                            Data Exhibit </a>
                        <a href="https://www.example_domain.com/legal-other/data-standards/">
                            Data Standards </a>
                        <a href="https://www.example_domain.com/legal-other/payment-addenda/">
                            Payment Addenda </a>
                    </div>
                </div>
            </div>
        </div>
    </div>
</div>

这是实际结果的片段:

------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
HEADER: Policies

LINK(s):
https://www.example_domain.com/privacy-policy/

DESCRIPTION(s):

                            Privacy Policy
                            Cookie Policy

------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
HEADER: Compliance

LINK(s):
https://www.example_domain.com/compliance/

DESCRIPTION(s):

                                Compliance
                            California Consumer Privacy Act (CCPA)
                            Disaster Recovery
                            GDPR
                            PCI DSS
                            PrivacyMark
                            View All

------------------------------------------------------------------------------------
------------------------------------------------------------------------------------

这是我所期待的(多个 links):

------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
HEADER: Policies

LINK(s):
https://www.example_domain.com/privacy-policy/
https://store.example_domain.com/EXHM/store?Action=DisplayEXCookiesPolicyPage

DESCRIPTION(s):

                            Privacy Policy
                            Cookie Policy

------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
HEADER: Compliance

LINK(s):
https://www.example_domain.com/compliance/
https://www.example_domain.com/compliance/ccpa/
https://www.example_domain.com/compliance/disaster-recovery/
https://www.example_domain.com/compliance/gdpr/
https://www.example_domain.com/compliance/pci-dss/
https://www.example_domain.com/compliance/privacymark/
https://www.example_domain.com/compliance/

DESCRIPTION(s):

                                Compliance
                            California Consumer Privacy Act (CCPA)
                            Disaster Recovery
                            GDPR
                            PCI DSS
                            PrivacyMark
                            View All

------------------------------------------------------------------------------------
------------------------------------------------------------------------------------

知道我做错了什么吗?

谢谢!

使用map获取多个属性:

$(element).find('a').get().map(a => $(a).attr('href'))