Scala:从不同级别的 XML 中删除节点
Scala: Removing nodes from XML at different levels
我的 xml 看起来像这样:(这是一个 NodeSeq
)
<first>...</first>
<second>...</second>
<third>
<foo>
<keepattr> ... </keepattr>
<otherattr1> ... </otherattr1>
</foo>
<otherattr2> ... </otherattr2>
</third>
我需要保留 <first>
,删除 <second>
和里面的任何东西,只保留 <keepattr>
里面的 <third>
,同时保持数据架构(保留 foo标记)
我如何在 Scala 中做到这一点?
我试过了,但我被困住了一级
val removeJunk = new RewriteRule {
override def transform(node: Node): NodeSeq = node match {
case e: Elem => e.label match {
case "second" => NodeSeq.Empty
case "third" => //?
}
case o => o
}
}
而且我可能有兴趣在计划中降低几个级别
编辑:我希望在不损害数据模型的情况下保留数据
<third>
<foo>
<keepattr> ... </keepattr>
<otherattr1> ... </otherattr1>
</foo>
<otherattr2> ... </otherattr2>
</third>
应该变成
<third>
<foo>
<keepattr> ... </keepattr>
</foo>
</third>
您可以使用 filterNot
和 RewriteRule
的组合。由于在每一步都使用 \
运算符,这可能效率低下,但我现在想不出任何其他解决方案:
val input: NodeBuffer = <first>foo</first>
<second>remove me</second>
<third>
<foo>
<keepattr>meh</keepattr>
<otherattr1>bar</otherattr1>
</foo>
<otherattr2>quux</otherattr2>
</third>
val extractKeepAttr = new RewriteRule {
override def transform(node: Node): NodeSeq = node match {
case e: Elem => e.label match {
case "keepattr" => e
case _ if (e \ "keepattr").nonEmpty =>
e copy (child = e.child.filter(c => (c \ "keepattr").nonEmpty) flatMap transform)
case _ => e
}
}
}
// returns <first>foo</first>, <third><foo><keepattr>meh</keepattr></foo></third>
val updatedXml = input.filterNot(_.label == "second").transform(extractKeepAttr)
编辑:更新答案
我想指出另一个答案,它消除了很多复杂性但不是那么漂亮...从 XML 中提取您需要的所有信息,将其存储在 vals 中,然后重建XML 如果您事先知道结构,请手动操作。
我的 xml 看起来像这样:(这是一个 NodeSeq
)
<first>...</first>
<second>...</second>
<third>
<foo>
<keepattr> ... </keepattr>
<otherattr1> ... </otherattr1>
</foo>
<otherattr2> ... </otherattr2>
</third>
我需要保留 <first>
,删除 <second>
和里面的任何东西,只保留 <keepattr>
里面的 <third>
,同时保持数据架构(保留 foo标记)
我如何在 Scala 中做到这一点?
我试过了,但我被困住了一级
val removeJunk = new RewriteRule {
override def transform(node: Node): NodeSeq = node match {
case e: Elem => e.label match {
case "second" => NodeSeq.Empty
case "third" => //?
}
case o => o
}
}
而且我可能有兴趣在计划中降低几个级别
编辑:我希望在不损害数据模型的情况下保留数据
<third>
<foo>
<keepattr> ... </keepattr>
<otherattr1> ... </otherattr1>
</foo>
<otherattr2> ... </otherattr2>
</third>
应该变成
<third>
<foo>
<keepattr> ... </keepattr>
</foo>
</third>
您可以使用 filterNot
和 RewriteRule
的组合。由于在每一步都使用 \
运算符,这可能效率低下,但我现在想不出任何其他解决方案:
val input: NodeBuffer = <first>foo</first>
<second>remove me</second>
<third>
<foo>
<keepattr>meh</keepattr>
<otherattr1>bar</otherattr1>
</foo>
<otherattr2>quux</otherattr2>
</third>
val extractKeepAttr = new RewriteRule {
override def transform(node: Node): NodeSeq = node match {
case e: Elem => e.label match {
case "keepattr" => e
case _ if (e \ "keepattr").nonEmpty =>
e copy (child = e.child.filter(c => (c \ "keepattr").nonEmpty) flatMap transform)
case _ => e
}
}
}
// returns <first>foo</first>, <third><foo><keepattr>meh</keepattr></foo></third>
val updatedXml = input.filterNot(_.label == "second").transform(extractKeepAttr)
编辑:更新答案
我想指出另一个答案,它消除了很多复杂性但不是那么漂亮...从 XML 中提取您需要的所有信息,将其存储在 vals 中,然后重建XML 如果您事先知道结构,请手动操作。