如何使用 Haskell 的 xml-管道解析 GPX 文件?
How to parse GPX file using Haskell's xml-conduit?
我想使用 xml-conduit
来解析 GPX 文件。到目前为止,我得到了以下信息:
{-# LANGUAGE OverloadedStrings #-}
import Control.Applicative
import Data.Text as T
import Text.XML
import Text.XML.Cursor
data Trkpt = Trkpt {
trkptLat :: Text,
trkptLon :: Text,
trkptEle :: Text,
trkptTime :: Text
} deriving (Show)
trkptsFromFile path =
gpxTrkpts . fromDocument <$> Text.XML.readFile def path
gpxTrkpts =
child >=> element "{http://www.topografix.com/GPX/1/0}trk" >=>
child >=> element "{http://www.topografix.com/GPX/1/0}trkseg" >=>
child >=> element "{http://www.topografix.com/GPX/1/0}trkpt" >=>
child >=> \e -> do
let ele = T.concat $ element "{http://www.topografix.com/GPX/1/0}ele" e >>= descendant >>= content
let time = T.concat $ element "{http://www.topografix.com/GPX/1/0}time" e >>= descendant >>= content
let lat = T.concat $ attribute "lat" e
let lon = T.concat $ attribute "lon" e
return $ Trkpt lat lon ele time
示例 GPX 文件是 here。
我得到了奇怪的结果,其中解析的文本大部分是空的,有一些零星的实际值,尽管原始 GPX 文件数据都是有效的。当有实际值时,它只在记录的其中一个字段中。
我很确定我没有正确使用 xml-conduit
API。我做错了什么?
两个问题。首先,命名空间有错字;应该是http://www.topografix.com/GPX/1/1
。其次,您最后的 Kleisli 箭头 (\e -> do -- etc.
) 作用于 trkpt
元素的子元素,而不是 trkpt
本身。这是一个 gpxTrkpts
应该做你想做的事:
gpxTrkpts =
child >=> element "{http://www.topografix.com/GPX/1/1}trk" >=>
child >=> element "{http://www.topografix.com/GPX/1/1}trkseg" >=>
child >=> element "{http://www.topografix.com/GPX/1/1}trkpt" >=>
\e -> do
let cs = child e
ele = T.concat $ cs >>= element "{http://www.topografix.com/GPX/1/1}ele" >>= descendant >>= content
time = T.concat $ cs >>= element "{http://www.topografix.com/GPX/1/1}time" >>= descendant >>= content
lat = T.concat $ attribute "lat" e
lon = T.concat $ attribute "lon" e
return $ Trkpt lat lon ele time
@duplode 指出了问题。这里还有一些评论。
怎么样
下面是一些可以帮助调试解析问题的代码:
代码:
{-# LANGUAGE OverloadedStrings #-}
module Lib2 where
import qualified Data.Text as T
import Data.Text (Text)
import Text.XML
import Text.XML.Cursor
import qualified Filesystem.Path.CurrentOS as Path
import Control.Monad
showNode (NodeElement e) = "NodeEement " ++ T.unpack (nameLocalName $ elementName e)
showNode (NodeInstruction _) = "NodeInstruction ..."
showNode (NodeContent t) = "NodeContent " ++ show t
showNode (NodeComment _) = "NodeComment"
testParser parser = do
content <- Text.XML.readFile def (Path.decodeString "sample.xml")
let nodes = map node $ parser (fromDocument content)
forM_ nodes $ \n -> putStrLn (showNode n)
像这样在ghci中使用它:
ghci> :set -XOverloadedStrings
ghci> :l Lib2
Lib2> testParser child
NodeContent "\n "
NodeEement metadata
NodeContent "\n "
NodeEement trk
NodeContent "\n "
NodeEement extensions
NodeContent "\n"
Lib2> testParser $ child >=> element "trk"
Lib2> testParser $ child >=> laxElement "trk"
NodeEement trk
Lib2> testParser $ child >=> laxElement "trk" >=> child >=> laxElement "trkseg"
NodeElement trkseg
Lib2> testParser $ child >=> laxElement "trk" >=> child >=> laxElement "trkseg" >=> child >=> laxElement "trkpt"
NodeEement trkpt
NodeEement trkpt
NodeEement trkpt
NodeEement trkpt
Lib2>
我想使用 xml-conduit
来解析 GPX 文件。到目前为止,我得到了以下信息:
{-# LANGUAGE OverloadedStrings #-}
import Control.Applicative
import Data.Text as T
import Text.XML
import Text.XML.Cursor
data Trkpt = Trkpt {
trkptLat :: Text,
trkptLon :: Text,
trkptEle :: Text,
trkptTime :: Text
} deriving (Show)
trkptsFromFile path =
gpxTrkpts . fromDocument <$> Text.XML.readFile def path
gpxTrkpts =
child >=> element "{http://www.topografix.com/GPX/1/0}trk" >=>
child >=> element "{http://www.topografix.com/GPX/1/0}trkseg" >=>
child >=> element "{http://www.topografix.com/GPX/1/0}trkpt" >=>
child >=> \e -> do
let ele = T.concat $ element "{http://www.topografix.com/GPX/1/0}ele" e >>= descendant >>= content
let time = T.concat $ element "{http://www.topografix.com/GPX/1/0}time" e >>= descendant >>= content
let lat = T.concat $ attribute "lat" e
let lon = T.concat $ attribute "lon" e
return $ Trkpt lat lon ele time
示例 GPX 文件是 here。
我得到了奇怪的结果,其中解析的文本大部分是空的,有一些零星的实际值,尽管原始 GPX 文件数据都是有效的。当有实际值时,它只在记录的其中一个字段中。
我很确定我没有正确使用 xml-conduit
API。我做错了什么?
两个问题。首先,命名空间有错字;应该是http://www.topografix.com/GPX/1/1
。其次,您最后的 Kleisli 箭头 (\e -> do -- etc.
) 作用于 trkpt
元素的子元素,而不是 trkpt
本身。这是一个 gpxTrkpts
应该做你想做的事:
gpxTrkpts =
child >=> element "{http://www.topografix.com/GPX/1/1}trk" >=>
child >=> element "{http://www.topografix.com/GPX/1/1}trkseg" >=>
child >=> element "{http://www.topografix.com/GPX/1/1}trkpt" >=>
\e -> do
let cs = child e
ele = T.concat $ cs >>= element "{http://www.topografix.com/GPX/1/1}ele" >>= descendant >>= content
time = T.concat $ cs >>= element "{http://www.topografix.com/GPX/1/1}time" >>= descendant >>= content
lat = T.concat $ attribute "lat" e
lon = T.concat $ attribute "lon" e
return $ Trkpt lat lon ele time
@duplode 指出了问题。这里还有一些评论。
- 怎么样
下面是一些可以帮助调试解析问题的代码:
代码:
{-# LANGUAGE OverloadedStrings #-}
module Lib2 where
import qualified Data.Text as T
import Data.Text (Text)
import Text.XML
import Text.XML.Cursor
import qualified Filesystem.Path.CurrentOS as Path
import Control.Monad
showNode (NodeElement e) = "NodeEement " ++ T.unpack (nameLocalName $ elementName e)
showNode (NodeInstruction _) = "NodeInstruction ..."
showNode (NodeContent t) = "NodeContent " ++ show t
showNode (NodeComment _) = "NodeComment"
testParser parser = do
content <- Text.XML.readFile def (Path.decodeString "sample.xml")
let nodes = map node $ parser (fromDocument content)
forM_ nodes $ \n -> putStrLn (showNode n)
像这样在ghci中使用它:
ghci> :set -XOverloadedStrings
ghci> :l Lib2
Lib2> testParser child
NodeContent "\n "
NodeEement metadata
NodeContent "\n "
NodeEement trk
NodeContent "\n "
NodeEement extensions
NodeContent "\n"
Lib2> testParser $ child >=> element "trk"
Lib2> testParser $ child >=> laxElement "trk"
NodeEement trk
Lib2> testParser $ child >=> laxElement "trk" >=> child >=> laxElement "trkseg"
NodeElement trkseg
Lib2> testParser $ child >=> laxElement "trk" >=> child >=> laxElement "trkseg" >=> child >=> laxElement "trkpt"
NodeEement trkpt
NodeEement trkpt
NodeEement trkpt
NodeEement trkpt
Lib2>