使用 XQuery 在 XML 的每个元素中规范化 space

Question

我正在 XML 这样 -

<a:price-range xmlns:c="http://iddn.icis.com/ns/core" xmlns:f="http://iddn.icis.com/ns/fields" xmlns:a="http://iddn.icis.com/ns/assets" xmlns:r="http://iddn.icis.com/ns/refdata">
    <c:id>
        http://iddn.icis.com/series-item/petchem/4021090-pricehistory-19990730000000</c:id>
    <c:type>series-item</c:type>
    <f:assessment-low>8.946586935</f:assessment-low>
    <f:assessment-high>9.946586935</f:assessment-high>
    <f:mid>9.44658693500000000000</f:mid>
    <f:period-label>
        <c:l10n xml:lang="en"/>
    </f:period-label>
</a:price-range>

我想规范化 XML 中的 space。就像上面的例子一样，c:id 元素中有 spaces。规范化 spaces 之后，上面的 XML 看起来像 -

<a:price-range xmlns:c="http://iddn.icis.com/ns/core" xmlns:f="http://iddn.icis.com/ns/fields" xmlns:a="http://iddn.icis.com/ns/assets" xmlns:r="http://iddn.icis.com/ns/refdata">
    <c:id>http://iddn.icis.com/series-item/petchem/4021090-pricehistory-19990730000000</c:id>
    <c:type>series-item</c:type>
    <f:assessment-low>8.946586935</f:assessment-low>
    <f:assessment-high>9.946586935</f:assessment-high>
    <f:mid>9.44658693500000000000</f:mid>
    <f:period-label>
        <c:l10n xml:lang="en"/>
    </f:period-label>
</a:price-range>

我看过 fn:normalise-space，但它只适用于字符串。

Answer 1

我猜 <xsl:strip-space elements="*"/> 工作完美，你需要先通过 xslt 将 xml 转换为 xml。

Answer 2

我不认为这可以通过应用序列化选项实现，您必须通过应用 transformation pattern 的树。该页面中的一个略微调整的示例，以标准化 space 并支持 namespaces:

declare function local:copy($node as node()) as node() {
  typeswitch($node)
    case $text as text()
      return text { normalize-space($text) }
    case $element as element()
      return
        element { QName(namespace-uri($element), name($element)) } {
                  $element/@*,
                  for $child in $element/(* | text()) return local:copy($child)
                }
    default return $node
 };


local:copy(
  <a:price-range xmlns:c="http://iddn.icis.com/ns/core" xmlns:f="http://iddn.icis.com/ns/fields" xmlns:a="http://iddn.icis.com/ns/assets" xmlns:r="http://iddn.icis.com/ns/refdata">
    <c:id>
        http://iddn.icis.com/series-item/petchem/4021090-pricehistory-19990730000000</c:id>
    <c:type>series-item</c:type>
    <f:assessment-low>8.946586935</f:assessment-low>
    <f:assessment-high>9.946586935</f:assessment-high>
    <f:mid>9.44658693500000000000</f:mid>
    <f:period-label>
        <c:l10n xml:lang="en"/>
    </f:period-label>
  </a:price-range>
)

Marklogic 还允许 apply an XSLT stylesheet，这可能是使用 @Raj 提议的 <xsl:strip-space elements="*"/> 的更优雅版本。

Answer 3

有人会为此打我，我运行有否决票的风险，WTH..

MarkLogic、xQuery，完成。

let  $xml := <a:price-range xmlns:c="http://iddn.icis.com/ns/core" xmlns:f="http://iddn.icis.com/ns/fields" xmlns:a="http://iddn.icis.com/ns/assets" xmlns:r="http://iddn.icis.com/ns/refdata">
<c:id>
    http://iddn.icis.com/series-item/petchem/4021090-pricehistory-19990730000000</c:id>
<c:type>series-item</c:type>
<f:assessment-low>8.946586935</f:assessment-low>
<f:assessment-high>9.946586935</f:assessment-high>
<f:mid>9.44658693500000000000</f:mid>
<f:period-label>
    <c:l10n xml:lang="en"/>
</f:period-label>
</a:price-range>

return xdmp:unquote(fn:replace(xdmp:quote($xml), "(<[^<]+>)\W+", ""))

Answer 4

这个功能对我来说很好用-

(:
  The rules/assumptions are:
  #1 Retain one leading space if the node isn't first, has non-space content, and has leading space.
  #2 Retain one trailing space if the node isn't last, isn't first, and has trailing space. 
  #3 Retain one trailing space if the node isn't last, is first, has trailing space, and has non-space content.
  #4 Retain a single space if the node is an only child and only has space content.
  :)
  declare function local:normalize-space-in-xml($input)
  {
     element {node-name($input)}
       {$input/@*,
         for $child in $input/node()
         return
           if ($child instance of element())
           then local:normalize-space-in-xml($child)
           else
             if ($child instance of text())
             then
               (:#1 Retain one leading space if node isn't first, has non-space content, and has leading space:)
               if ($child/position() ne 1 and matches($child,'^\s') and normalize-space($child) ne '')
               then (' ', normalize-space($child))
               else
                 (:#4 retain one space, if the node is an only child, and has content but it's all space:)
                 if ($child/last() eq 1 and string-length($child) ne 0 and normalize-space($child) eq '')
                 (: this overrules standard normalization:)
                 then ' '
                 else
                   (:#2 if the node isn't last, isn't first, and has trailing space, retain trailing space and collapse and trim the rest:)
                   if ($child/position() ne 1 and $child/position() ne last() and matches($child,'\s$'))
                   then (normalize-space($child), ' ')
                   else
                     (:#3 if the node isn't last, is first, has trailing space, and has non-space content, then keep trailing space:)
                     if ($child/position() eq 1 and matches($child,'\s$') and normalize-space($child) ne '')
                     then (normalize-space($child), ' ')
                     (:if the node is an only child, and has content which is not all space, then trim and collapse, that is, apply standard normalization:)
                     else normalize-space($child)
              else $child
      }
  };

使用 XQuery 在 XML 的每个元素中规范化 space

Normalize space in each element of XML using XQuery

xquery

marklogic

marklogic-8