使用 JSoup 连接 HTML 个元素

Question

在 JSoup 中有什么方法可以 join 内存中的两个或更多元素 - 即在 Document 树中，而不生成原始 HTML 字符串？

例如，下面的 HTML div 元素带有一些嵌套标签

<div>This is text with <custom>a custom nested tag</custom> and some <other>text within a tag</other>, all of which should become part of the top-level </div>.

会转化为

<div>This is text with a custom nested tag and some text within a tag, all of which should become part of the top-level </div>.

本质上，上面示例中的嵌套标签已被删除，但它们的内容仍然存在，就好像字符串 replace() 操作在原始 HTML 上进行了运行，之前由 JSoup.

解析为 Document 对象

整个操作可以这样编码：

public static method splice(Document document, List<String> tags) {
  for (String tag : tags) {
    // Find the tag node (Element) in the tree
    // Remove the tag node and join its content with its parent
  }
}

Answer 1

Jsoup 的 upwrap() 函数正是您要找的。它删除元素但保留子元素。

使用 JSoup 连接 HTML 个元素

Joining HTML elements with JSoup

html

java

tags

element

jsoup