如何使用 Haskell 的 xml-管道解析 GPX 文件?

How to parse GPX file using Haskell's xml-conduit?

我想使用 xml-conduit 来解析 GPX 文件。到目前为止,我得到了以下信息:

{-# LANGUAGE OverloadedStrings #-}

import Control.Applicative
import Data.Text           as T
import Text.XML
import Text.XML.Cursor

data Trkpt = Trkpt {
  trkptLat :: Text,
  trkptLon :: Text,
  trkptEle :: Text,
  trkptTime :: Text
  } deriving (Show)

trkptsFromFile path =
  gpxTrkpts . fromDocument <$> Text.XML.readFile def path

gpxTrkpts =
  child >=> element "{http://www.topografix.com/GPX/1/0}trk" >=>
  child >=> element "{http://www.topografix.com/GPX/1/0}trkseg" >=>
  child >=> element "{http://www.topografix.com/GPX/1/0}trkpt" >=>
  child >=> \e -> do
    let ele  = T.concat $ element "{http://www.topografix.com/GPX/1/0}ele" e >>= descendant >>= content
    let time = T.concat $ element "{http://www.topografix.com/GPX/1/0}time" e >>= descendant >>= content
    let lat  = T.concat $ attribute "lat" e
    let lon  = T.concat $ attribute "lon" e
    return $ Trkpt lat lon ele time

示例 GPX 文件是 here

我得到了奇怪的结果,其中解析的文本大部分是空的,有一些零星的实际值,尽管原始 GPX 文件数据都是有效的。当有实际值时,它只在记录的其中一个字段中。

我很确定我没有正确使用 xml-conduit API。我做错了什么?

两个问题。首先,命名空间有错字;应该是http://www.topografix.com/GPX/1/1。其次,您最后的 Kleisli 箭头 (\e -> do -- etc.) 作用于 trkpt 元素的子元素,而不是 trkpt 本身。这是一个 gpxTrkpts 应该做你想做的事:

gpxTrkpts =
  child >=> element "{http://www.topografix.com/GPX/1/1}trk" >=>
  child >=> element "{http://www.topografix.com/GPX/1/1}trkseg" >=>
  child >=> element "{http://www.topografix.com/GPX/1/1}trkpt" >=>
  \e -> do
    let cs = child e
        ele  = T.concat $ cs >>= element "{http://www.topografix.com/GPX/1/1}ele" >>= descendant >>= content
        time = T.concat $ cs >>= element "{http://www.topografix.com/GPX/1/1}time" >>= descendant >>= content
        lat  = T.concat $ attribute "lat" e
        lon  = T.concat $ attribute "lon" e
    return $ Trkpt lat lon ele time

@duplode 指出了问题。这里还有一些评论。

  1. 使用gpx-conduit package

  2. 怎么样
  3. 下面是一些可以帮助调试解析问题的代码:

代码:

{-# LANGUAGE OverloadedStrings #-}
module Lib2 where

import qualified Data.Text           as T
import Data.Text (Text)
import Text.XML
import Text.XML.Cursor
import qualified Filesystem.Path.CurrentOS as Path
import Control.Monad

showNode (NodeElement e)     = "NodeEement " ++ T.unpack (nameLocalName $ elementName e)
showNode (NodeInstruction _) = "NodeInstruction ..."
showNode (NodeContent t)     = "NodeContent " ++ show t
showNode (NodeComment _)     = "NodeComment"

testParser parser =  do
  content <- Text.XML.readFile def (Path.decodeString "sample.xml")
  let nodes = map node $ parser (fromDocument content)
  forM_ nodes $ \n -> putStrLn (showNode n)

像这样在ghci中使用它:

ghci> :set -XOverloadedStrings
ghci> :l Lib2
Lib2> testParser child
NodeContent "\n  "
NodeEement metadata
NodeContent "\n  "
NodeEement trk
NodeContent "\n  "
NodeEement extensions
NodeContent "\n"

Lib2> testParser $ child >=> element "trk"
Lib2> testParser $ child >=> laxElement "trk"
NodeEement trk

Lib2> testParser $ child >=> laxElement "trk" >=> child >=> laxElement "trkseg"
NodeElement trkseg
Lib2> testParser $ child >=> laxElement "trk" >=> child >=> laxElement "trkseg" >=> child >=> laxElement "trkpt"
NodeEement trkpt
NodeEement trkpt
NodeEement trkpt
NodeEement trkpt
Lib2>