如何用朽木只得到TEXT_NODE

Question

我有这个例子 HTML 我想用 kuchiki:

解析

<a href="https://example.com"><em>@</em>Bananowy</a>

我只想要 Bananowy 没有 @。

JavaScript 的类似问题：How to get the text node of an element?

Answer 1

首先，让我们从解析器如何解析开始：

    <a href="https://example.com"><em>@</em>Bananowy</a>

变成一棵树。见下图：

现在，如果您尝试做显而易见的事情并调用 anchor.text_contents()，您将获得锚标记 (<a>) 的所有文本节点后代的所有文本内容。这就是 text_contents 根据 CSS 定义的行为方式。

但是，您只想获得 "Bananowy" 您有几种方法可以做到：

extern crate kuchiki;

use kuchiki::traits::*;

fn main() {
    let html = r"<a href='https://example.com'><em>@</em>Bananowy</a>";

    let document = kuchiki::parse_html().one(html);

    let selector = "a";
    let anchor = document.select_first(selector).unwrap();
    // Quick and dirty hack
    let last_child = anchor.as_node().last_child().unwrap();
    println!("{:?}", last_child.into_text_ref().unwrap());

    // Iterating solution
    for children in anchor.as_node().children() {
        if let Some(a) = children.as_text() {
            println!("{:?}", a);
        }
    }

    // Iterating solution - Using `text_nodes()` iterators
    anchor.as_node().children().text_nodes().for_each(|e| {
        println!("{:?}", e);
    });

    // text1 and text2 are examples how to get `String`
    let text1 = match anchor.as_node().children().text_nodes().last() {
        Some(x) => x.as_node().text_contents(),
        None => String::from(""),
    };

    let text2 = match anchor.as_node().children().text_nodes().last() {
        Some(x) => x.borrow().clone(),
        None => String::from(""),
    };
}

第一种方法是脆弱的、hackish 的方法。您只需要知道 "Bananowy" 是您的锚标签的 last_child，并相应地获取它 anchor.as_node().last_child().unwrap().into_text_ref().unwrap()。

第二种解决方案是使用（as_text() 方法）迭代锚标记的子节点（即 [Tag(em), TextNode("Bananowy")]）和 select 仅文本节点。我们使用方法 as_text() 对所有 Nodes 不是 TextNode 的 returns None 执行此操作。这比第一个解决方案更不脆弱，如果第一个解决方案将不起作用，例如你有 <a><em>@</em>Banan<i>!</i>owy</a>.

编辑：

首选解决方案

在四处寻找之后，我找到了一个更好的解决方案来解决您的问题。它被称为 TextNodes iterator。

考虑到这一点，只需编写 anchor.as_node().children().text_nodes().<<ITERATOR CODE GOES HERE>>;，然后按照您认为合适的方式映射或操作条目。

为什么这个解决方案更好？它更简洁，它使用了老式的 Iterator，因此它与您在上面给出的 JS 中的答案非常相似。

如何用朽木只得到TEXT_NODE

How to get only TEXT_NODE with kuchiki

rust

kuchiki

首选解决方案