Xpath获取所有节点的文本，但带有特定标签的节点

Question

所以，我得到了这种类型的 html 架构：

<table id="proposal-details" class="details">

                        <tbody><tr>
                            <th>
                                Application type:
                            </th>
                            <td>
                                P
                            </td>
                        </tr>
                        <tr>
                            <th>
                                Proposed development
                            </th>
                            <td>
                                Prune 1 x Eucalyptus
                            </td>
                        </tr>
                        <tr>
                            <th>
                                Date received:
                            </th>
                            <td>
                                06 Feb 2015
                            </td>
                        </tr>
                        <tr>
                            <th>
                                Registration date:
                                <br>
                                (Statutory start date)
                            </th>
                            <td>
                                06 Feb 2015
                            </td>
                        </tr>

我有 xpath 可以捕获所有 th；这一直顺利到最后一个 th 文本 Registration date: 我实际上不需要 br 要选择的文本。

我解决了这个问题，问题出在这个 xpath，

len(response.xpath("//table//tr//th[not(.//br)]/text()").extract())

整个 th 标签被忽略了。有什么建议吗？

这是我得到的输出：

[u' Application type: ',
 u' Proposed development ',
 u' Date received: ']

我实际上需要 注册日期： 列表中没有 （法定开始日期）。

Answer 1

据我了解您的问题，您想获取所有 th 元素的文本但忽略 <br> 之后的文本。如果是这样的话，下面的 XPath

//table//tr//th/text()[not(preceding-sibling::br)]

当应用到您的输入时得到结果

Application type:
Proposed development
Date received:
Registration date:

您使用的 XPath 忽略了每个 th 具有 child br:

th[not(.//br)]

而 th/text()[not(preceding-sibling::br)] 检索 th 的所有文本元素，这些元素没有前面的同级元素 br。

Xpath获取所有节点的文本，但带有特定标签的节点

Xpath to get text of all node but the one with specific tag

python

xpath

scrapy