从子树中提取值

Question

我正在用 HXT 解析一个 XML 文件，我试图将一些节点提取分解成模块化的部分（我一直将其用作我的 guide） .不幸的是，一旦我进行了第一级解析，我就无法弄清楚如何应用一些选择器。

 import Text.XML.HXT.Core

 let node tag = multi (hasName tag)
 xml <- readFile "test.xml"
 let doc = readString [withValidate yes, withParseHTML no, withWarnings no] xml
 books <- runX $ doc >>> node "book"

我看到 books 有一个类型 [XmlTree]

 :t books
 books :: [XmlTree]

现在我想获取books的第一个元素，然后提取子树中的一些值。

 let b = head(books)
 runX $ b >>> node "cost"

Couldn't match type ‘Data.Tree.NTree.TypeDefs.NTree’
               with ‘IOSLA (XIOState ()) XmlTree’
Expected type: IOSLA (XIOState ()) XmlTree XNode
  Actual type: XmlTree
In the first argument of ‘(>>>)’, namely ‘b’
In the second argument of ‘($)’, namely ‘b >>> node "cost"’

一旦我有了 XmlTree，我就找不到选择器了，我展示了上面的错误用法来说明我想要的。我知道我可以做到：

 runX $ doc >>> node "book" >>> node "cost" /> getText
 ["55.9","95.0"]

但我不仅对 cost 感兴趣，而且对 book 中的更多元素感兴趣。 XML 文件非常深，所以我不想用 <+> 嵌套所有内容，而且很多评估者更喜欢提取我想要的块，然后在单独的函数中提取子元素。

示例（编造的）XML 文件：

 <?xml version="1.0" encoding="UTF-8"?><start xmlns="http://www.example.com/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
     <books> 
         <book>
             <author>
                 <name>
                     <first>Joe</first>
                     <last>Smith</last>
                 </name>
                 <city>New York City</city>
             </author>
             <released>1990-11-15</released>
             <isbn>1234567890</isbn>
             <publisher>X Publisher</publisher>
             <cost>55.9</cost>
         </book>
         <book>
             <author>
                 <name>
                     <first>Jane</first>
                     <last>Jones</last>
                 </name>
                 <city>San Francisco</city>
             </author>
             <released>1999-01-19</released>
             <isbn>0987654321</isbn>
             <publisher>Y Publisher</publisher>
             <cost>95.0</cost>
         </book>
     </books>
  </start>

谁能帮我理解，如何提取book的子元素？理想情况下，使用 >>> 和 node 这样的好东西，这样我就可以定义自己的函数，例如 getCost、getName 等，每个函数都大致具有签名 XmlTree -> [String]

Answer 1

doc 不是你想的那样。它的类型为 IOStateArrow s b XmlTree。你真的应该再读一遍你的指南，你想知道的都在标题 "Avoiding IO".

下总结了

箭头基本上是函数。 SomeArrow a b 可以被认为是 a -> b 类型的 generalized/specialized 函数。 >>> 等作用域内的运算符，是针对箭头组合的，类似于函数组合。您的 books 的类型为 [XmlTree]，因此它不是箭头，不能与箭头组合。满足你需求的是runLA，它将node "tag"这样的箭头转换成一个普通的函数：

module Main where

import           Text.XML.HXT.Core

main = do
  html <- readFile "test.xml"
  let doc = readString [withValidate yes, withParseHTML no, withWarnings no] html
  books <- runX $ doc >>> node "book"
  -- runLA (node "cost" /> getText) :: XmlTree -> [String]
  let costs = books >>= runLA (node "cost" /> getText)
  print costs

node tag = multi (hasName tag)

从子树中提取值

Extracting Values from a Subtree

haskell

hxt