如何交互使用 tDOM？

Question

我觉得我在这里遗漏了一些微妙的东西。

我有一个 $doc 可以看到 $doc asText 确实包含要解析的页面内容。它来自 dom parse -html5 $body.

从这里开始，我想以交互方式探索 DOM。例如，获取锚点列表。看起来 $doc selectNodes {//a} 会起作用*，但那 return 什么也没有。我尝试使用 selectNodes（/head，/body，/html ... 什么都没有！）。我可以看到有 childNodes 所以结构似乎完好无损。

探索这些节点的更好方法是什么，以便我找出问题所在？

https://wiki.tcl-lang.org/page/XPath - 这就是我想要遵循的

Answer 1

运行 $doc asXML 显示 html 元素已被解析为命名空间：

<html xmlns="http://www.w3.org/1999/xhtml">

您必须使用此命名空间来查找元素：

$doc selectNodes -namespaces {ns http://www.w3.org/1999/xhtml} //ns:a

如果要进行多次查询，设置一次命名空间会更容易：

$doc selectNodesNamespaces {ns http://www.w3.org/1999/xhtml}
$doc selectNodes //ns:a
$doc selectNodes /ns:html

等等。

Answer 2

这一次你可以简化你的生活，因为你似乎使用 HTML（不是 XML，或 XHTML）因为你通过了 -html5 到 dom parse，而你 select 用于 HTML 个元素（锚点）。

到目前为止，HTML没有命名空间的意义，所以你可以忽略它们。使用 -ignorexmlns 标志到 dom parse.

% package req tdom
0.9.2
% set someHTML {<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <title>Title of the document</title></head><body>
    <svg width="100" height="100">
      <circle cx="50" cy="50" r="40" stroke="green" stroke-width="4" fill="yellow" />
    </svg>
  </body>
</html>}
% set doc [dom parse -html5 -ignorexmlns $someHTML]

这样，您将能够运行您的 XPath 查询、表达式 w/o 命名空间感知：

$doc selectNodes {//svg}

请注意，建议使用 tDOM:

Since this probably isn't wanted by a lot of users and adds only burden for no good in a lot of use cases -html5 can be combined with -ignorexmlns, in which case all nodes and attributes in the DOM tree are not in an XML namespace.

如何交互使用 tDOM？

How to interactively use tDOM?

tcl

tdom