用于在两个 milestones/empty 元素之间选择 xml 元素的 xpath

Question

在下面的 xml 文件中，我将文本结构编码为 div 元素以及包含文本的书籍的布局信息（两列）使用空 pb（页首）和 cb（列首）元素。

XML/TEI 输入：

<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" schematypens="http://relaxng.org/ns/structure/1.0"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
    <fileDesc>
        <titleStmt>
            <title type="main" xml:lang="en">Testfile</title>
        </titleStmt>
        <publicationStmt>
            <p>Test</p>
        </publicationStmt>
        <sourceDesc>
            <p>Testfile</p></sourceDesc>
    </fileDesc>
</teiHeader>
    
    
    <text>
        <body>
            <pb n="1r"/><fw type="header">Some header</fw>
            <cb n="a"/>
            <lb/><div n="1"><p>Line 1.1
                <lb/>Line 1.2
                <lb/>Line 1.3
                <lb/>Line 1.4
            </p></div>
            <cb n="b"/>
            <lb/><div n="2"><p>Line 2.1
                <lb/>Line 2.2
                <lb/>Line 2.3
                <lb/>Line 2.4
                <pb n="1v"/><fw type="header">Some header</fw>
                <cb n="a"/>
                <lb/>Line 1.1
                <lb/>Line 1.2
                <lb/>Line 1.3
                <lb/>Line 1.4
            </p></div>
            <cb n="b"/>
            <lb/><div n="2"><p>Line 1.1
                <lb/>Line 1.2
                <lb/>Line 1.3
                <lb/>Line 1.4
            </p></div>
        </body>
    </text>
</TEI>

我想要的

现在，我想使用 lxml.etree 和 XPath 遍历树以 select 列的所有 lb 元素，f.i。所有 lb 之间的元素 <pb n="1r"/><fw type="header">Some header</fw><cb n="a"/> ... 以及之后的第一个 <cb n="b"/> 元素。

我试过的

我为此使用了以下 xpath 表达式：

//lb[preceding::pb[@n="1r"] and following::cb[@n="b"]]

但是，它 select 不仅是预期的元素，而且还有所有其他 lb 后跟 <cb n="b"/> 元素的元素。

我也试过限制第一次出现<cb n="b"/>，但并没有改变结果：

//lb[preceding::pb[@n="1r"] and following::cb[@n="b"][1]]

我已经尝试过一些类似的问题，例如 XPath select all elements between two specific elements，但是当 select 通过其 @n 属性正确 pb 时，建议的答案不起作用。

有人能给我指明正确的方向吗？如何 select 仅给定列的磅数？

编辑： 注意：在 etree 中，必须将命名空间 tei 添加到 XPath 表达式才能使用已接受的答案：

root.xpath('.//tei:lb[preceding::tei:pb[@n="2r"] and count(preceding::tei:cb[@n="b"]) = 0]', namespaces = {'tei':'http://www.tei-c.org/ns/1.0'})

Answer 1

你能试试下面的 XPath 表达式吗：

//lb[preceding::pb[@n="1r"] and count(preceding::cb[@n='b']) = 0]

谓词 count(preceding::cb[@n='b']) = 0 应该排除 lb 个元素后跟一个元素。

用于在两个 milestones/empty 元素之间选择 xml 元素的 xpath

xpath for selecting xml elements between two milestones/empty elements

python

xml

xpath

lxml

tei