使用 XQuery 在 XML 的每个元素中规范化 space
Normalize space in each element of XML using XQuery
我正在 XML 这样 -
<a:price-range xmlns:c="http://iddn.icis.com/ns/core" xmlns:f="http://iddn.icis.com/ns/fields" xmlns:a="http://iddn.icis.com/ns/assets" xmlns:r="http://iddn.icis.com/ns/refdata">
<c:id>
http://iddn.icis.com/series-item/petchem/4021090-pricehistory-19990730000000</c:id>
<c:type>series-item</c:type>
<f:assessment-low>8.946586935</f:assessment-low>
<f:assessment-high>9.946586935</f:assessment-high>
<f:mid>9.44658693500000000000</f:mid>
<f:period-label>
<c:l10n xml:lang="en"/>
</f:period-label>
</a:price-range>
我想规范化 XML 中的 space。就像上面的例子一样,c:id 元素中有 spaces。规范化 spaces 之后,上面的 XML 看起来像 -
<a:price-range xmlns:c="http://iddn.icis.com/ns/core" xmlns:f="http://iddn.icis.com/ns/fields" xmlns:a="http://iddn.icis.com/ns/assets" xmlns:r="http://iddn.icis.com/ns/refdata">
<c:id>http://iddn.icis.com/series-item/petchem/4021090-pricehistory-19990730000000</c:id>
<c:type>series-item</c:type>
<f:assessment-low>8.946586935</f:assessment-low>
<f:assessment-high>9.946586935</f:assessment-high>
<f:mid>9.44658693500000000000</f:mid>
<f:period-label>
<c:l10n xml:lang="en"/>
</f:period-label>
</a:price-range>
我看过 fn:normalise-space,但它只适用于字符串。
我猜 <xsl:strip-space elements="*"/>
工作完美,你需要先通过 xslt 将 xml 转换为 xml。
我不认为这可以通过应用序列化选项实现,您必须通过应用 transformation pattern 的树。该页面中的一个略微调整的示例,以标准化 space 并支持 namespaces:
declare function local:copy($node as node()) as node() {
typeswitch($node)
case $text as text()
return text { normalize-space($text) }
case $element as element()
return
element { QName(namespace-uri($element), name($element)) } {
$element/@*,
for $child in $element/(* | text()) return local:copy($child)
}
default return $node
};
local:copy(
<a:price-range xmlns:c="http://iddn.icis.com/ns/core" xmlns:f="http://iddn.icis.com/ns/fields" xmlns:a="http://iddn.icis.com/ns/assets" xmlns:r="http://iddn.icis.com/ns/refdata">
<c:id>
http://iddn.icis.com/series-item/petchem/4021090-pricehistory-19990730000000</c:id>
<c:type>series-item</c:type>
<f:assessment-low>8.946586935</f:assessment-low>
<f:assessment-high>9.946586935</f:assessment-high>
<f:mid>9.44658693500000000000</f:mid>
<f:period-label>
<c:l10n xml:lang="en"/>
</f:period-label>
</a:price-range>
)
Marklogic 还允许 apply an XSLT stylesheet,这可能是使用 @Raj 提议的 <xsl:strip-space elements="*"/>
的更优雅版本。
有人会为此打我,我 运行 有否决票的风险,WTH..
MarkLogic、xQuery,完成。
let $xml := <a:price-range xmlns:c="http://iddn.icis.com/ns/core" xmlns:f="http://iddn.icis.com/ns/fields" xmlns:a="http://iddn.icis.com/ns/assets" xmlns:r="http://iddn.icis.com/ns/refdata">
<c:id>
http://iddn.icis.com/series-item/petchem/4021090-pricehistory-19990730000000</c:id>
<c:type>series-item</c:type>
<f:assessment-low>8.946586935</f:assessment-low>
<f:assessment-high>9.946586935</f:assessment-high>
<f:mid>9.44658693500000000000</f:mid>
<f:period-label>
<c:l10n xml:lang="en"/>
</f:period-label>
</a:price-range>
return xdmp:unquote(fn:replace(xdmp:quote($xml), "(<[^<]+>)\W+", ""))
这个功能对我来说很好用-
(:
The rules/assumptions are:
#1 Retain one leading space if the node isn't first, has non-space content, and has leading space.
#2 Retain one trailing space if the node isn't last, isn't first, and has trailing space.
#3 Retain one trailing space if the node isn't last, is first, has trailing space, and has non-space content.
#4 Retain a single space if the node is an only child and only has space content.
:)
declare function local:normalize-space-in-xml($input)
{
element {node-name($input)}
{$input/@*,
for $child in $input/node()
return
if ($child instance of element())
then local:normalize-space-in-xml($child)
else
if ($child instance of text())
then
(:#1 Retain one leading space if node isn't first, has non-space content, and has leading space:)
if ($child/position() ne 1 and matches($child,'^\s') and normalize-space($child) ne '')
then (' ', normalize-space($child))
else
(:#4 retain one space, if the node is an only child, and has content but it's all space:)
if ($child/last() eq 1 and string-length($child) ne 0 and normalize-space($child) eq '')
(: this overrules standard normalization:)
then ' '
else
(:#2 if the node isn't last, isn't first, and has trailing space, retain trailing space and collapse and trim the rest:)
if ($child/position() ne 1 and $child/position() ne last() and matches($child,'\s$'))
then (normalize-space($child), ' ')
else
(:#3 if the node isn't last, is first, has trailing space, and has non-space content, then keep trailing space:)
if ($child/position() eq 1 and matches($child,'\s$') and normalize-space($child) ne '')
then (normalize-space($child), ' ')
(:if the node is an only child, and has content which is not all space, then trim and collapse, that is, apply standard normalization:)
else normalize-space($child)
else $child
}
};
我正在 XML 这样 -
<a:price-range xmlns:c="http://iddn.icis.com/ns/core" xmlns:f="http://iddn.icis.com/ns/fields" xmlns:a="http://iddn.icis.com/ns/assets" xmlns:r="http://iddn.icis.com/ns/refdata">
<c:id>
http://iddn.icis.com/series-item/petchem/4021090-pricehistory-19990730000000</c:id>
<c:type>series-item</c:type>
<f:assessment-low>8.946586935</f:assessment-low>
<f:assessment-high>9.946586935</f:assessment-high>
<f:mid>9.44658693500000000000</f:mid>
<f:period-label>
<c:l10n xml:lang="en"/>
</f:period-label>
</a:price-range>
我想规范化 XML 中的 space。就像上面的例子一样,c:id 元素中有 spaces。规范化 spaces 之后,上面的 XML 看起来像 -
<a:price-range xmlns:c="http://iddn.icis.com/ns/core" xmlns:f="http://iddn.icis.com/ns/fields" xmlns:a="http://iddn.icis.com/ns/assets" xmlns:r="http://iddn.icis.com/ns/refdata">
<c:id>http://iddn.icis.com/series-item/petchem/4021090-pricehistory-19990730000000</c:id>
<c:type>series-item</c:type>
<f:assessment-low>8.946586935</f:assessment-low>
<f:assessment-high>9.946586935</f:assessment-high>
<f:mid>9.44658693500000000000</f:mid>
<f:period-label>
<c:l10n xml:lang="en"/>
</f:period-label>
</a:price-range>
我看过 fn:normalise-space,但它只适用于字符串。
我猜 <xsl:strip-space elements="*"/>
工作完美,你需要先通过 xslt 将 xml 转换为 xml。
我不认为这可以通过应用序列化选项实现,您必须通过应用 transformation pattern 的树。该页面中的一个略微调整的示例,以标准化 space 并支持 namespaces:
declare function local:copy($node as node()) as node() {
typeswitch($node)
case $text as text()
return text { normalize-space($text) }
case $element as element()
return
element { QName(namespace-uri($element), name($element)) } {
$element/@*,
for $child in $element/(* | text()) return local:copy($child)
}
default return $node
};
local:copy(
<a:price-range xmlns:c="http://iddn.icis.com/ns/core" xmlns:f="http://iddn.icis.com/ns/fields" xmlns:a="http://iddn.icis.com/ns/assets" xmlns:r="http://iddn.icis.com/ns/refdata">
<c:id>
http://iddn.icis.com/series-item/petchem/4021090-pricehistory-19990730000000</c:id>
<c:type>series-item</c:type>
<f:assessment-low>8.946586935</f:assessment-low>
<f:assessment-high>9.946586935</f:assessment-high>
<f:mid>9.44658693500000000000</f:mid>
<f:period-label>
<c:l10n xml:lang="en"/>
</f:period-label>
</a:price-range>
)
Marklogic 还允许 apply an XSLT stylesheet,这可能是使用 @Raj 提议的 <xsl:strip-space elements="*"/>
的更优雅版本。
有人会为此打我,我 运行 有否决票的风险,WTH..
MarkLogic、xQuery,完成。
let $xml := <a:price-range xmlns:c="http://iddn.icis.com/ns/core" xmlns:f="http://iddn.icis.com/ns/fields" xmlns:a="http://iddn.icis.com/ns/assets" xmlns:r="http://iddn.icis.com/ns/refdata">
<c:id>
http://iddn.icis.com/series-item/petchem/4021090-pricehistory-19990730000000</c:id>
<c:type>series-item</c:type>
<f:assessment-low>8.946586935</f:assessment-low>
<f:assessment-high>9.946586935</f:assessment-high>
<f:mid>9.44658693500000000000</f:mid>
<f:period-label>
<c:l10n xml:lang="en"/>
</f:period-label>
</a:price-range>
return xdmp:unquote(fn:replace(xdmp:quote($xml), "(<[^<]+>)\W+", ""))
这个功能对我来说很好用-
(:
The rules/assumptions are:
#1 Retain one leading space if the node isn't first, has non-space content, and has leading space.
#2 Retain one trailing space if the node isn't last, isn't first, and has trailing space.
#3 Retain one trailing space if the node isn't last, is first, has trailing space, and has non-space content.
#4 Retain a single space if the node is an only child and only has space content.
:)
declare function local:normalize-space-in-xml($input)
{
element {node-name($input)}
{$input/@*,
for $child in $input/node()
return
if ($child instance of element())
then local:normalize-space-in-xml($child)
else
if ($child instance of text())
then
(:#1 Retain one leading space if node isn't first, has non-space content, and has leading space:)
if ($child/position() ne 1 and matches($child,'^\s') and normalize-space($child) ne '')
then (' ', normalize-space($child))
else
(:#4 retain one space, if the node is an only child, and has content but it's all space:)
if ($child/last() eq 1 and string-length($child) ne 0 and normalize-space($child) eq '')
(: this overrules standard normalization:)
then ' '
else
(:#2 if the node isn't last, isn't first, and has trailing space, retain trailing space and collapse and trim the rest:)
if ($child/position() ne 1 and $child/position() ne last() and matches($child,'\s$'))
then (normalize-space($child), ' ')
else
(:#3 if the node isn't last, is first, has trailing space, and has non-space content, then keep trailing space:)
if ($child/position() eq 1 and matches($child,'\s$') and normalize-space($child) ne '')
then (normalize-space($child), ' ')
(:if the node is an only child, and has content which is not all space, then trim and collapse, that is, apply standard normalization:)
else normalize-space($child)
else $child
}
};