Scala 将 XML 转换为键值映射
Scala convert XML to key value map
与
相关
问题如下,想象一个没有任何特定模式的XML
<persons>
<total>2</total>
<someguy>
<firstname>john</firstname>
<name>Snow</name>
</someguy>
<otherperson>
<sex>female</sex>
</otherperson>
</persons>
为了处理,我想在键值图中包含这个:
"Persons/total" -> 2
"Persons/someguy/firstname" -> john
"Persons/someguy/name" -> Snow
"Persons/otherperson/sex" -> female
最好我有一些很好的递归函数,我在其中按深度优先遍历 XML 代码并简单地堆叠所有标签,直到找到一个值,然后 return 该值与标签堆栈一起。不幸的是,我正在努力将 return 类型与输入类型连接起来,因为我 return 我的输入序列。让我向您展示我到目前为止所拥有的,显然 foreach 是一个问题,因为这个 returns 单位,但地图也无法工作,因为它 returns 是一个序列。
def dfs(n: NodeSeq, keyStack: String, map: Map[String,String])
:(NodeSeq, String, Map[String,String]) = {
n.foreach(x => {
if (x.child.isEmpty) {
dfs(x.child, keyStack, map + (keyStack+ x.label + " " -> x.text))
}
else {
dfs(x.child, keyStack+ x.label + "/", map)
}
}
)
}
非常感谢您的帮助!
经过一番尝试,这是我能做到的最优雅的方式。我不喜欢的是:
- 每 child 就变成 depth-first,所以您之后需要对结果进行平整。这也是我错过根节点标签的原因。
- 一路上拖了很多XML,所以它可能太占用内存了?
有想法请改进!
import scala.xml._
val xml = "<persons><total>2</total><someguy><firstname>john</firstname><name>Snow</name></someguy><otherperson><sex>female</sex></otherperson></persons>"
val result: Elem = scala.xml.XML.loadString(xml)
def linearize(node: Node, stack: String, map: Map[String,String])
: List[(Node, String, Map[String,String])] = {
(node, stack, map) :: node.child.flatMap {
case e: Elem => {
if (e.descendant.size == 1) linearize(e, stack, map ++ Map(stack + "/" + e.label -> e.text))
else linearize(e, stack + "/" + e.label, map)
}
case _ => Nil
}.toList
}
linearize(result, "", Map[String,String]()).flatMap(_._3).toMap
之后我们仍然需要将地图展平,但至少递归部分相当短。上面的代码应该可以在您的 Scala 工作表中使用。
要考虑的一种情况是元素具有前缀:
val xml = <a>
<b>
<c>1</c>
<d>2</d>
<e>
<z:f>3</z:f>
</e>
</b>
</a>
还有其他场景需要考虑(包括实体、评论、声明),但这是一个很好的起点:
def nodeToMap(xml: Elem): Map[String, String] = {
def nodeToMapWithPrefix(prefix: String, xml: Node): Map[String, String] = {
val pathAndText = for {
child <- xml.child
} yield {
child match {
case e: Elem if e.prefix == null =>
nodeToMapWithPrefix(s"$prefix/${e.label}", e)
case e: Elem =>
nodeToMapWithPrefix(s"$prefix/${e.prefix}:${e.label}", e)
case t: Text => Map(prefix -> t.text)
case er: EntityRef => Map(prefix -> er.text)
}
}
pathAndText.foldLeft(Map.empty[String, String]){_ ++ _}
}
nodeToMapWithPrefix(xml.label, xml)
}
要考虑的另一种情况是当文本不在叶元素中时:
val xml = <a>
<b>text
<c>1</c>
<d>2</d>
</b>
</a>
受 Sparky 的回答启发,但更适用于更普遍的情况:
val emptyMap = Map.empty[String,List[String]]
def xml2map(xml: String): Map[String,List[String]] = add2map(XML.loadString(xml), "", emptyMap)
private def add2map(node: Node, xPath: String, oldMap: Map[String,List[String]]): Map[String,List[String]] = {
val elems = node.child.filter(_.isInstanceOf[Elem])
val xCurr = xPath + "/" + node.label
val currElems = elems.filter(_.child.count(_.isInstanceOf[Elem]) == 0)
val nextElems = elems.diff(currElems)
val currMap = currElems.foldLeft(oldMap)((map, elem) => map + {
val key = xCurr + "/" + elem.label
val oldValue = map.getOrElse(key, List.empty[String])
val newValue = oldValue ::: List(elem.text)
key -> newValue
})
nextElems.foldLeft(currMap)((map, elem) => map ++ add2map(elem, xCurr, emptyMap))
}
喜欢XML
<persons>
<total>2</total>
<someguy>
<firstname>john</firstname>
<name>Snow</name>
<alive>in 1st season</alive>
<alive>in 2nd season</alive>
<alive>...</alive>
<alive>even in last season</alive>
<alive>how long more?</alive>
</someguy>
<otherperson>
<sex>female</sex>
</otherperson>
</persons>
它在下面生成一个 Map[String,List[String]](在 .toString() 之后):
Map(
/persons/total -> List(2),
/persons/someguy/firstname -> List(john),
/persons/someguy/alive -> List(in 1st season, in 2nd season, ..., even in last season, how long more?),
/persons/otherperson/sex -> List(female),
/persons/someguy/name -> List(Snow)
)
与
问题如下,想象一个没有任何特定模式的XML
<persons>
<total>2</total>
<someguy>
<firstname>john</firstname>
<name>Snow</name>
</someguy>
<otherperson>
<sex>female</sex>
</otherperson>
</persons>
为了处理,我想在键值图中包含这个:
"Persons/total" -> 2
"Persons/someguy/firstname" -> john
"Persons/someguy/name" -> Snow
"Persons/otherperson/sex" -> female
最好我有一些很好的递归函数,我在其中按深度优先遍历 XML 代码并简单地堆叠所有标签,直到找到一个值,然后 return 该值与标签堆栈一起。不幸的是,我正在努力将 return 类型与输入类型连接起来,因为我 return 我的输入序列。让我向您展示我到目前为止所拥有的,显然 foreach 是一个问题,因为这个 returns 单位,但地图也无法工作,因为它 returns 是一个序列。
def dfs(n: NodeSeq, keyStack: String, map: Map[String,String])
:(NodeSeq, String, Map[String,String]) = {
n.foreach(x => {
if (x.child.isEmpty) {
dfs(x.child, keyStack, map + (keyStack+ x.label + " " -> x.text))
}
else {
dfs(x.child, keyStack+ x.label + "/", map)
}
}
)
}
非常感谢您的帮助!
经过一番尝试,这是我能做到的最优雅的方式。我不喜欢的是:
- 每 child 就变成 depth-first,所以您之后需要对结果进行平整。这也是我错过根节点标签的原因。
- 一路上拖了很多XML,所以它可能太占用内存了?
有想法请改进!
import scala.xml._
val xml = "<persons><total>2</total><someguy><firstname>john</firstname><name>Snow</name></someguy><otherperson><sex>female</sex></otherperson></persons>"
val result: Elem = scala.xml.XML.loadString(xml)
def linearize(node: Node, stack: String, map: Map[String,String])
: List[(Node, String, Map[String,String])] = {
(node, stack, map) :: node.child.flatMap {
case e: Elem => {
if (e.descendant.size == 1) linearize(e, stack, map ++ Map(stack + "/" + e.label -> e.text))
else linearize(e, stack + "/" + e.label, map)
}
case _ => Nil
}.toList
}
linearize(result, "", Map[String,String]()).flatMap(_._3).toMap
之后我们仍然需要将地图展平,但至少递归部分相当短。上面的代码应该可以在您的 Scala 工作表中使用。
要考虑的一种情况是元素具有前缀:
val xml = <a>
<b>
<c>1</c>
<d>2</d>
<e>
<z:f>3</z:f>
</e>
</b>
</a>
还有其他场景需要考虑(包括实体、评论、声明),但这是一个很好的起点:
def nodeToMap(xml: Elem): Map[String, String] = {
def nodeToMapWithPrefix(prefix: String, xml: Node): Map[String, String] = {
val pathAndText = for {
child <- xml.child
} yield {
child match {
case e: Elem if e.prefix == null =>
nodeToMapWithPrefix(s"$prefix/${e.label}", e)
case e: Elem =>
nodeToMapWithPrefix(s"$prefix/${e.prefix}:${e.label}", e)
case t: Text => Map(prefix -> t.text)
case er: EntityRef => Map(prefix -> er.text)
}
}
pathAndText.foldLeft(Map.empty[String, String]){_ ++ _}
}
nodeToMapWithPrefix(xml.label, xml)
}
要考虑的另一种情况是当文本不在叶元素中时:
val xml = <a>
<b>text
<c>1</c>
<d>2</d>
</b>
</a>
受 Sparky 的回答启发,但更适用于更普遍的情况:
val emptyMap = Map.empty[String,List[String]]
def xml2map(xml: String): Map[String,List[String]] = add2map(XML.loadString(xml), "", emptyMap)
private def add2map(node: Node, xPath: String, oldMap: Map[String,List[String]]): Map[String,List[String]] = {
val elems = node.child.filter(_.isInstanceOf[Elem])
val xCurr = xPath + "/" + node.label
val currElems = elems.filter(_.child.count(_.isInstanceOf[Elem]) == 0)
val nextElems = elems.diff(currElems)
val currMap = currElems.foldLeft(oldMap)((map, elem) => map + {
val key = xCurr + "/" + elem.label
val oldValue = map.getOrElse(key, List.empty[String])
val newValue = oldValue ::: List(elem.text)
key -> newValue
})
nextElems.foldLeft(currMap)((map, elem) => map ++ add2map(elem, xCurr, emptyMap))
}
喜欢XML
<persons>
<total>2</total>
<someguy>
<firstname>john</firstname>
<name>Snow</name>
<alive>in 1st season</alive>
<alive>in 2nd season</alive>
<alive>...</alive>
<alive>even in last season</alive>
<alive>how long more?</alive>
</someguy>
<otherperson>
<sex>female</sex>
</otherperson>
</persons>
它在下面生成一个 Map[String,List[String]](在 .toString() 之后):
Map(
/persons/total -> List(2),
/persons/someguy/firstname -> List(john),
/persons/someguy/alive -> List(in 1st season, in 2nd season, ..., even in last season, how long more?),
/persons/otherperson/sex -> List(female),
/persons/someguy/name -> List(Snow)
)