Xpath - select 第一次出现具有特定类型的节点

Xpath - select first occurence of node with specific type

我正在尝试 select 以下结构中特定类型的所有第一次出现:

<div class="jobs-list">
    <div class="job-listing">
        <h3>Title1</h3>
        <span class="organization">
            <a href="https://www.domain1.org/" target="_blank">Org1</a>
        </span>
        <span class="location">Loc1</span>
        <div class="description">
            desc1
            <a href="https://www.domain1-1.org/" target="_blank">https://www.domain1-1.org/</a>
            <span class="list-date">Posted on: 01/19/2022</span>
        </div>
    </div>
    <div class="job-listing">
        <h3>Title2</h3>
        <span class="organization">
            <a href="https://www.domain2.org/" target="_blank">Org2</a>
        </span>
        <span class="location">Loc2</span>
        <div class="description">
            desc2
            <a href="https://www.domain2.org/" target="_blank">https://www.domain2.org/</a>
            <span class="list-date">Posted on: 01/18/2022</span>
        </div>
    </div>
    <div class="job-listing">
        <h3>Title3</h3>
        <span class="organization">
            <a href="https://www.domain3.org/" target="_blank">Org3</a>
        </span>
        <span class="location">Loc3</span>
        <div class="description">
            desc3            
            <a href="mailto:user@domain3.org">user@domain3.org</a>
            <span class="list-date">Posted on: 01/19/2022</span>
        </div>
    </div>
    <div class="job-listing">
        <h3>TItle4</h3>
        <span class="organization">Org4</span>
        <span class="location">Loc4</span>
        <div class="description">
            desc4
            <a href="mailto:user@domain4.org">user@domain4.org</a>
            <a href="https://www.domain4.org/" target="_blank">https://www.domain4.org/</a>
            <a href="https://www.domain4-1.org/" target="_blank">https://www.domain4-1.org/</a>
            <span class="list-date">Posted on: 01/06/2022</span>
        </div>
    </div>
</div>

具体来说,我需要的结果如下:

https://www.domain1.org/
https://www.domain2.org/
https://www.domain3.org/
https://www.domain4.org/

应该是每个 div[@class='job-listing'] 下的第一个 a/@href,但我不确定如何表达。一些注意事项:

谢谢!

//div[@class='job-listing']/descendant::a[1] 给你每个 div 的第一个 a 后代,如果你想添加检查然后使用例如//div[@class='job-listing']/descendant::a[starts-with(@href, 'http')][1].

如果您需要 href 属性节点,请使用 //div[@class='job-listing']/descendant::a[starts-with(@href, 'http')][1]/@href。请注意,XSLT 或 XQuery 的某些默认序列化不允许您序列化独立属性节点的序列,但在 XPath 2 或 3 中,您当然可以使用例如//div[@class='job-listing']/descendant::a[starts-with(@href, 'http')][1]/@href/string() 改为获取一系列属性值。

我建议使用更基于 class 的选择器:

//span[@class="organization"]//a/@href 
| 
//div[@class="description"][not(preceding-sibling::span/a)]
//a[contains(@href,"http")][1]/@href

Select links 在 organization (A) 下,第一个 http link 在 description 下没有遇到 A

live tester link