为给定节点生成序列号并将值复制到所有子节点

Generate sequence number for a given node and copy the value into all the child nodes

我需要生成序列号并在找到页面节点时递增它。需要为每个子节点复制此值。

[输入]

<xml>
    <doc>
     <page>
       <characters>
           <char a="a1" b="b1" y="y1" z="z1"  start="1" weight="100">F</char>
           <char a="a2" b="b2" y="y2" z="z2"  start="0" weight="80">r</char>
           <char a="a3" b="b3" y="y3" z="z3"  start="0" weight="80">o</char>
           <char a="a4" b="b4" y="y4" z="z4"  start="0" weight="100">m</char>
           <char a="a5" b="b5" y="y5" z="z5"> </char>
           <char a="a6" b="b6" y="y6" z="z6"  start="1" weight="100">a</char>
           <char a="a7" b="b7" y="y7" z="z7"  start="0" weight="80">n</char>
           <char a="a8" b="b8" y="y8" z="z8"  start="0" weight="80">d</char>
           <char a="a9" b="b9" y="y9" z="z9"> </char>
       </characters>
   </page>
    <page>
       <characters>
           <char a="a1" b="b1" y="y1" z="z1"  start="1" weight="100">t</char>
           <char a="a2" b="b2" y="y2" z="z2"  start="0" weight="80">y</char>
           <char a="a3" b="b3" y="y3" z="z3"  start="0" weight="80">p</char>
           <char a="a4" b="b4" y="y4" z="z4"  start="0" weight="100">e</char>
           <char a="a5" b="b5" y="y5" z="z5"> </char>
           <char a="a6" b="b6" y="y6" z="z6"  start="1" weight="100">v</char>
           <char a="a7" b="b7" y="y7" z="z7"  start="0" weight="80">a</char>
           <char a="a8" b="b8" y="y8" z="z8"  start="0" weight="80">l</char>
           <char a="a9" b="b9" y="y9" z="z9"> </char>
       </characters>
   </page>
</doc>
</xml>

[预期输出]

<xml>
   <data>
   <page>
       <word>   
            <pageNumber>1</pageNumber>
            <value>From</value>
            <coordinates>a1 b1 y4 z4</coordinates>
            <avgconfidence>90</avgconfidence>        
        </word>

        <word>
            <pageNumber>1</pageNumber>
            <value>and</value>
            <coordinates>a6 b6 y8 z8</coordinates>
            <avgconfidence>90</avgconfidence>
        </word>
    </page>

    <page>
        <word>
            <pageNumber>2</pageNumber>
            <value>type</value>
            <coordinates>a1 b1 y4 z4</coordinates>
            <avgconfidence>90</avgconfidence>    
        </word>
        <word>
            <pageNumber>2</pageNumber>
            <value>val</value>
            <coordinates>a6 b6 y8 z8</coordinates>
            <avgconfidence>90</avgconfidence>
         </word>
    </page>
  <data>
</xml>

下面写的是部分 xslt,它将在页面节点级别生成页码,但不会在单词(子节点)级别生成页码。

[XSLT]

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="yes" indent="yes" />

    <xsl:key name="wordChars" match="char[@start='0']"
             use="generate-id(preceding-sibling::char[@start='1'][1])" />

    <xsl:template match="/">
        <xml>
           <data>
            <xsl:apply-templates select ="xml/doc/page" />              
           </data>
        </xml>
    </xsl:template>

    <xsl:template match="*">
        <page>
        <pageNumber>
            <xsl:value-of select="position()" />
            </pageNumber>
            <xsl:apply-templates
                                  select="characters/char[@start='1']" />
        </page>
    </xsl:template>

    <xsl:template match="char">

        <xsl:variable name="word" select=". | key('wordChars', generate-id())" />

        <word>
            <value>
                <xsl:for-each select="$word"><xsl:value-of select="."/></xsl:for-each>
            </value>
            <coordinates>
                <xsl:value-of select="concat(
                     $word[1]/@a, ' ', $word[1]/@b, ' ',
                     $word[last()]/@y, ' ', $word[last()]/@z)" />
            </coordinates>
            <avgconfidence>
                <xsl:value-of select="sum($word/@weight) div count($word)" />
            </avgconfidence>
        </word>
    </xsl:template>
</xsl:stylesheet>

从上面的 XSL 获得的输出

<xml>
    <data>
        <page>
            <pageNumber>1</pageNumber>
            <word>
                <value>From</value>
                <coordinates>a1 b1 y4 z4</coordinates>
                <avgconfidence>90</avgconfidence>
            </word>
            <word>
                <value>and</value>
                <coordinates>a6 b6 y8 z8</coordinates>
                <avgconfidence>86.6666666666667</avgconfidence>
            </word>
        </page>
        <page>
            <pageNumber>2</pageNumber>
            <word>
                <value>type</value>
                <coordinates>a1 b1 y4 z4</coordinates>
                <avgconfidence>90</avgconfidence>
            </word>
            <word>
                <value>val</value>
                <coordinates>a6 b6 y8 z8</coordinates>
                <avgconfidence>86.6666666666667</avgconfidence>
            </word>
        </page>
    </data>
</xml>

请指教

您可以在这里使用 xsl:number 元素来计算 page 元素的数量

<pageNumber><xsl:number count="page" /></pageNumber>

为了将其放在上下文中,请尝试使用此模板进行匹配 char

<xsl:template match="char">
    <xsl:variable name="word" select=". | key('wordChars', generate-id())" />
    <word>
        <pageNumber><xsl:number count="page" /></pageNumber>
        <value>
            <xsl:for-each select="$word"><xsl:value-of select="."/></xsl:for-each>
        </value>
        <coordinates>
            <xsl:value-of select="concat(
                 $word[1]/@a, ' ', $word[1]/@b, ' ',
                 $word[last()]/@y, ' ', $word[last()]/@z)" />
        </coordinates>
        <avgconfidence>
            <xsl:value-of select="sum($word/@weight) div count($word)" />
        </avgconfidence>
    </word>
</xsl:template>

Tim 的 xsl:number 方法对于这种情况可能是最简单的,但更普遍的是,您可以使用参数将数据从一个模板传递到另一个模板,例如

<xsl:template match="page">
  <page>
    <xsl:apply-templates select="characters/char[@start='1']">
      <xsl:with-param name="pageNumber" select="position()" />
    </xsl:apply-templates>
  </page>
</xsl:template>

<xsl:template match="char">
  <xsl:param name="pageNumber" />
  <xsl:variable name="word" select=". | key('wordChars', generate-id())" />
  <word>
    <pageNumber><xsl:value-of select="$pageNumber" /></pageNumber>
    <!-- rest of template as before -->

您在模板的开头使用 xsl:param 来声明它可以接受的参数,并在 apply-templatescall-template 中使用 with-param 来填充值参数。