xml 的 clj-xpath 具有简单和嵌套的标签
clj-xpath for xml with simple and nested tags
我有使用 clj-xpath 库从 xml 中提取内容的功能(仅限内容)。
(ns example
(:use [clj-xpath.core]))
(def data-url
"http://api.eventful.com/rest/events/search?app_key=4H4Vff4PdrTGp3vV&keywords=music&location=New+York&date=Future")
(defn xml-data [url] (slurp url))
(defn defxmldoc [url]
(xml->doc (xml-data url)))
(defn contents-only [url root-tag tags]
(vec(map(fn [item]
(into {}
(map (fn [tag]
[tag ($x:text (str "./" (name tag))item)])tags)))
(take 5 ($x root-tag (defxmldoc url))))))
函数调用如下所示
(contents-only data-url "/search/events/event" [:title :url])
当我尝试从嵌套标签中提取文本时,它适用于非嵌套标签,即。
<performers>
<performer>
<id>P0-001-000009049-1</id>
<url>...</url>
<name>Lindsey Buckingham</name>
<short_bio>Rock</short_bio>
<creator>TomAzoff</creator>
<linker>evdb</linker>
</performer>
函数调用如下所示
(contents-only data-url "/search/events/event" [:title :url :name])
我收到 RuntimeException 错误,来自 xml({:children...) for xpath(./name) clj-xpath 的结果 (0) 多于(或少于)1 个。core/throwf (core.clj:26)
如何更改我的仅内容函数,以便我也可以传递嵌套标签?
最快的方法:在 contents-only
函数中将 "./"
更改为 ".//"
。
user> (first (contents-only data-url "/search/events/event" [:title :id :name]))
{:title "Legally Blonde the Musical", :id "P0-001-000351944-7", :name "Legally Blonde The Musical"}
user>
如 xpath documentation 中所述,.//name
将 select 所有节点 name
从当前节点开始,无论在层次结构中的任何位置。
如果name
不是唯一的,它可能不是你想要的,一种方法是在你指定的路径中明确,例如
(contents-only data-url "/search/events/event"
[[:title]
[:performers :performer :id]
[:performers :performer :name]])
并拥有一些辅助功能,例如:
(defn build-path
([sep kys] (build-path nil sep kys))
([root sep kys]
(->> kys (map name) (interpose sep)
(concat (when root (list root sep))) (apply str))))
(defn path
"build a path from a collection"
[t]
(build-path "." \/ t))
user> (path [:performers :performer :id])
"./performers/performer/id"
(defn path-key
"Transform [:a :b :c] into :a-b-c"
[t]
(->> t (build-path \-) keyword))
user> (path-key [:performers :performer :id])
:performers-performer-id
然后 contents-only
变成:
(defn contents-only2 [url root-tag tags]
(vec (map(fn [item]
(into {}
(map (fn [tag]
[(path-key tag) ($x:text (path tag) item)])
tags)))
(take 5 ($x root-tag (defxmldoc url))))))
结果:
user> (first (contents-only2 data-url "/search/events/event"
[[:title]
[:performers :performer :id]
[:performers :performer :name]]))
{:title "Legally Blonde the Musical", :performers-performer-id "P0-001-000351944-7", :performers-performer-name "Legally Blonde The Musical"}
user>
我有使用 clj-xpath 库从 xml 中提取内容的功能(仅限内容)。
(ns example
(:use [clj-xpath.core]))
(def data-url
"http://api.eventful.com/rest/events/search?app_key=4H4Vff4PdrTGp3vV&keywords=music&location=New+York&date=Future")
(defn xml-data [url] (slurp url))
(defn defxmldoc [url]
(xml->doc (xml-data url)))
(defn contents-only [url root-tag tags]
(vec(map(fn [item]
(into {}
(map (fn [tag]
[tag ($x:text (str "./" (name tag))item)])tags)))
(take 5 ($x root-tag (defxmldoc url))))))
函数调用如下所示
(contents-only data-url "/search/events/event" [:title :url])
当我尝试从嵌套标签中提取文本时,它适用于非嵌套标签,即。
<performers>
<performer>
<id>P0-001-000009049-1</id>
<url>...</url>
<name>Lindsey Buckingham</name>
<short_bio>Rock</short_bio>
<creator>TomAzoff</creator>
<linker>evdb</linker>
</performer>
函数调用如下所示
(contents-only data-url "/search/events/event" [:title :url :name])
我收到 RuntimeException 错误,来自 xml({:children...) for xpath(./name) clj-xpath 的结果 (0) 多于(或少于)1 个。core/throwf (core.clj:26)
如何更改我的仅内容函数,以便我也可以传递嵌套标签?
最快的方法:在 contents-only
函数中将 "./"
更改为 ".//"
。
user> (first (contents-only data-url "/search/events/event" [:title :id :name]))
{:title "Legally Blonde the Musical", :id "P0-001-000351944-7", :name "Legally Blonde The Musical"}
user>
如 xpath documentation 中所述,.//name
将 select 所有节点 name
从当前节点开始,无论在层次结构中的任何位置。
如果name
不是唯一的,它可能不是你想要的,一种方法是在你指定的路径中明确,例如
(contents-only data-url "/search/events/event"
[[:title]
[:performers :performer :id]
[:performers :performer :name]])
并拥有一些辅助功能,例如:
(defn build-path
([sep kys] (build-path nil sep kys))
([root sep kys]
(->> kys (map name) (interpose sep)
(concat (when root (list root sep))) (apply str))))
(defn path
"build a path from a collection"
[t]
(build-path "." \/ t))
user> (path [:performers :performer :id])
"./performers/performer/id"
(defn path-key
"Transform [:a :b :c] into :a-b-c"
[t]
(->> t (build-path \-) keyword))
user> (path-key [:performers :performer :id])
:performers-performer-id
然后 contents-only
变成:
(defn contents-only2 [url root-tag tags]
(vec (map(fn [item]
(into {}
(map (fn [tag]
[(path-key tag) ($x:text (path tag) item)])
tags)))
(take 5 ($x root-tag (defxmldoc url))))))
结果:
user> (first (contents-only2 data-url "/search/events/event"
[[:title]
[:performers :performer :id]
[:performers :performer :name]]))
{:title "Legally Blonde the Musical", :performers-performer-id "P0-001-000351944-7", :performers-performer-name "Legally Blonde The Musical"}
user>