Haskell |如何从深度嵌套的数据结构中获取值?
Haskell | How to obtain a value from a deeply nested data structure?
我是 haskell 的初学者,我正在尝试通过一些基本的解析来学习更多关于这门语言的知识。
我有一些代码可以解析 XML 文件并生成这个
[ Element
{ elName = QName
{ qName = "title"
, qURI = Nothing
, qPrefix = Nothing
}
, elAttribs = []
, elContent =
[ Text
( CData
{ cdVerbatim = CDataText
, cdData = "This string is what I want to obtain" -- string to view.
, cdLine = Just 27
}
)
]
, elLine = Just 27
}
]
其中 Element
只是一个 XML library's 数据类型
我想做的是获取字符串"This string is what I want to obtain"
我不确定如何在不展开整个数据结构的情况下执行此操作。我发现这种方式非常混乱且容易出错。
我做了一些一般研究并找到了 Lens 库,虽然有一些教程,但我仍在努力解析嵌套数据结构。
这是我要解析外观的 XML 文件
<GoodreadsResponse>
<Request>
<authentication>true</authentication>
<key>HOKCk4yYS8UjyducqmgRw</key>
<method>search_search</method>
</Request>
<search>
<query>fantasy</query>
<results-start>1</results-start>
<results-end>20</results-end>
<total-results>35221</total-results>
<source>Goodreads</source>
<query-time-seconds>0.21</query-time-seconds>
<results>
<work>
<id type="integer">2384</id>
<books_count type="integer">51</books_count>
<ratings_count type="integer">78825</ratings_count>
<text_reviews_count type="integer">3357</text_reviews_count>
<original_publication_year type="integer">2002</original_publication_year>
<original_publication_month type="integer">2</original_publication_month>
<original_publication_day type="integer">18</original_publication_day>
<average_rating>4.17</average_rating>
<best_book type="Book">
<id type="integer">84136</id>
<title>Fantasy Lover (Hunter Legends Series #1)</title>
<author>
<id type="integer">4430</id>
<name>Sherrilyn Kenyon</name>
</author>
<image_url>https://images.gr-assets.com/books/1348332807m/84136.jpg</image_url>
<small_image_url>https://images.gr-assets.com/books/1348332807s/84136.jpg</small_image_url>
</best_book>
</work>
<work>
<id type="integer">6734901</id>
<books_count type="integer">42</books_count>
<ratings_count type="integer">18358</ratings_count>
<text_reviews_count type="integer">985</text_reviews_count>
<original_publication_year type="integer">2010</original_publication_year>
<original_publication_month type="integer" nil="true"/>
<original_publication_day type="integer" nil="true"/>
<average_rating>4.26</average_rating>
<best_book type="Book">
<id type="integer">6542645</id>
<title>Fantasy in Death (In Death, #30)</title>
<author>
<id type="integer">17065</id>
<name>J.D. Robb</name>
</author>
<image_url>https://s.gr-assets.com/assets/nophoto/book/111x148-bcc042a9c91a29c1d680899eff700a03.png</image_url>
<small_image_url>https://s.gr-assets.com/assets/nophoto/book/50x75-a91bf249278a81aabab721ef782c4a74.png</small_image_url>
</best_book>
</work>
...
...
由于 xml
本身没有定义任何光学器件,您将需要另一个定义的包。 @Li-yaoXia 找到一个:lens-xml
.
#!/usr/bin/env cabal
{- cabal:
build-depends: base
, xml
, lens
, lens-xml
-}
{-# LANGUAGE OverloadedStrings #-}
import Control.Lens
import Text.XML.Light.Types
import Text.XML.Light.Lens
x = [ Element
{ elName = QName
{ qName = "title"
, qURI = Nothing
, qPrefix = Nothing
}
, elAttribs = []
, elContent =
[ Text
( CData
{ cdVerbatim = CDataText
, cdData = "This string is what I want to obtain" -- string to view.
, cdLine = Just 27
}
)
]
, elLine = Just 27
}
]
main :: IO ()
main = print (x ^? ix 0 . elContentL . ix 0 . _Text . cdDataL)
您可以 运行 使用最新版本的 cabal
:
$ cabal new-run Main.hs
<<lots of build output snipped>>
Just "This string is what I want to obtain"
免责声明:我不确定我是否同意使用 lens
完成此任务的想法。就个人而言,我倾向于首先将 XML 转换为一种数据类型(当 XML 不符合预期模式时会显示错误消息),然后处理该数据类型。但是,您确实要求基于 lens
的解决方案...
使用列表理解和记录访问器相当清楚:
get :: [Element] -> [String]
get es = [cdData c | e <- es, Text c <- elContent e ]
Text c
模式将自动过滤掉 elContent e
.
中的任何 Elem e
或 CRef s
值
一旦你了解到,对于列表,=<<
意味着 concatMap
,你可以用
保存几个字符
get :: [Element] -> [String]
get es = [cdData c | Text c <- elContent =<< es]
此外,如果您只想要 cdData
而 cdVerbatim
是 CDataText
,您可以添加该条件。
get :: [Element] -> [String]
get es = [cdData c | Text c <- elContent =<< es, cdVerbatim c == CDataText ]
我是 haskell 的初学者,我正在尝试通过一些基本的解析来学习更多关于这门语言的知识。
我有一些代码可以解析 XML 文件并生成这个
[ Element
{ elName = QName
{ qName = "title"
, qURI = Nothing
, qPrefix = Nothing
}
, elAttribs = []
, elContent =
[ Text
( CData
{ cdVerbatim = CDataText
, cdData = "This string is what I want to obtain" -- string to view.
, cdLine = Just 27
}
)
]
, elLine = Just 27
}
]
其中 Element
只是一个 XML library's 数据类型
我想做的是获取字符串"This string is what I want to obtain"
我不确定如何在不展开整个数据结构的情况下执行此操作。我发现这种方式非常混乱且容易出错。
我做了一些一般研究并找到了 Lens 库,虽然有一些教程,但我仍在努力解析嵌套数据结构。
这是我要解析外观的 XML 文件
<GoodreadsResponse>
<Request>
<authentication>true</authentication>
<key>HOKCk4yYS8UjyducqmgRw</key>
<method>search_search</method>
</Request>
<search>
<query>fantasy</query>
<results-start>1</results-start>
<results-end>20</results-end>
<total-results>35221</total-results>
<source>Goodreads</source>
<query-time-seconds>0.21</query-time-seconds>
<results>
<work>
<id type="integer">2384</id>
<books_count type="integer">51</books_count>
<ratings_count type="integer">78825</ratings_count>
<text_reviews_count type="integer">3357</text_reviews_count>
<original_publication_year type="integer">2002</original_publication_year>
<original_publication_month type="integer">2</original_publication_month>
<original_publication_day type="integer">18</original_publication_day>
<average_rating>4.17</average_rating>
<best_book type="Book">
<id type="integer">84136</id>
<title>Fantasy Lover (Hunter Legends Series #1)</title>
<author>
<id type="integer">4430</id>
<name>Sherrilyn Kenyon</name>
</author>
<image_url>https://images.gr-assets.com/books/1348332807m/84136.jpg</image_url>
<small_image_url>https://images.gr-assets.com/books/1348332807s/84136.jpg</small_image_url>
</best_book>
</work>
<work>
<id type="integer">6734901</id>
<books_count type="integer">42</books_count>
<ratings_count type="integer">18358</ratings_count>
<text_reviews_count type="integer">985</text_reviews_count>
<original_publication_year type="integer">2010</original_publication_year>
<original_publication_month type="integer" nil="true"/>
<original_publication_day type="integer" nil="true"/>
<average_rating>4.26</average_rating>
<best_book type="Book">
<id type="integer">6542645</id>
<title>Fantasy in Death (In Death, #30)</title>
<author>
<id type="integer">17065</id>
<name>J.D. Robb</name>
</author>
<image_url>https://s.gr-assets.com/assets/nophoto/book/111x148-bcc042a9c91a29c1d680899eff700a03.png</image_url>
<small_image_url>https://s.gr-assets.com/assets/nophoto/book/50x75-a91bf249278a81aabab721ef782c4a74.png</small_image_url>
</best_book>
</work>
...
...
由于 xml
本身没有定义任何光学器件,您将需要另一个定义的包。 @Li-yaoXia 找到一个:lens-xml
.
#!/usr/bin/env cabal
{- cabal:
build-depends: base
, xml
, lens
, lens-xml
-}
{-# LANGUAGE OverloadedStrings #-}
import Control.Lens
import Text.XML.Light.Types
import Text.XML.Light.Lens
x = [ Element
{ elName = QName
{ qName = "title"
, qURI = Nothing
, qPrefix = Nothing
}
, elAttribs = []
, elContent =
[ Text
( CData
{ cdVerbatim = CDataText
, cdData = "This string is what I want to obtain" -- string to view.
, cdLine = Just 27
}
)
]
, elLine = Just 27
}
]
main :: IO ()
main = print (x ^? ix 0 . elContentL . ix 0 . _Text . cdDataL)
您可以 运行 使用最新版本的 cabal
:
$ cabal new-run Main.hs
<<lots of build output snipped>>
Just "This string is what I want to obtain"
免责声明:我不确定我是否同意使用 lens
完成此任务的想法。就个人而言,我倾向于首先将 XML 转换为一种数据类型(当 XML 不符合预期模式时会显示错误消息),然后处理该数据类型。但是,您确实要求基于 lens
的解决方案...
使用列表理解和记录访问器相当清楚:
get :: [Element] -> [String]
get es = [cdData c | e <- es, Text c <- elContent e ]
Text c
模式将自动过滤掉 elContent e
.
Elem e
或 CRef s
值
一旦你了解到,对于列表,=<<
意味着 concatMap
,你可以用
get :: [Element] -> [String]
get es = [cdData c | Text c <- elContent =<< es]
此外,如果您只想要 cdData
而 cdVerbatim
是 CDataText
,您可以添加该条件。
get :: [Element] -> [String]
get es = [cdData c | Text c <- elContent =<< es, cdVerbatim c == CDataText ]