提取第一个 table 的第一行

Question

我正在尝试提取已解析 XML 文档中第一个 table (table) 对象的第一 table 行 (tr)。

我认为以下方法可以解决问题：

//table[1]//tr[1]//text()

然而 return 节点太多，例如 this page 我希望 return:

Wikimedia Commons has media related to 
Public transport schedules

但是显然不属于第一行的以下节点的文本也是 returns:

<div style="font-size:110%"><a href="/wiki/Public_transport" title="Public transport">Public transport</a></div>

（只出现了文字，但我修补了完整的节点，这样会更容易找到它）

Answer 1

您需要从 TD 而非 tr 中提取文本。

试试这个。

//table[1]//tr[1]//td//text()

Answer 2

这是 // 定义方式的一个微妙之处 - //table[1] 并不意味着 "the first table"，而是 "every table that is the first table element in its respective parent"。这同样适用于 tr 步骤 - 您将获得 thead 中的第一行和 tbody 中的第一行。

如果您想要整个文档中第一个 table 的第一行，您需要使用括号：

(//table//tr)[1]

这表示 "find all rows in all tables, then from that list select just the first element in document order"。

提取第一个 table 的第一行

Extracting the first table's first row

xml

xpath

xml-parsing