提高 XSLT 3.0 性能以避免昂贵的预先选择

improve XSLT 3.0 performance to avoid expensive preceding selection

我有一个如下所示的输入结构,

<?xml version="1.0" encoding="UTF-8"?>
<MessageFormat name='WagonStatus_Fplv' version='2.02'>
    <StructFormat name='K0-HEADER' delimOptional='n'>
        <FieldFormat name='SATZKOPF-K0' type='String' delimOptional='y' length='2' strlenInChars='y' codepage='UTF-8'/>
        <FieldFormat name='Externer_Partner' type='String' delimOptional='y' length='35' strlenInChars='y' codepage='UTF-8'/>
        <FieldFormat name='INTERCHANGEREFERENZNUMMER' type='String' delimOptional='y' length='14' strlenInChars='y' codepage='UTF-8'/>
        <FieldFormat name='NACHRICHTENREFERENZNUMMER' type='String' delimOptional='y' length='14' strlenInChars='y' codepage='UTF-8'/>
        <FieldFormat name='NACHRICHTENTYP' type='String' delimOptional='y' length='6' strlenInChars='y' codepage='UTF-8'/>
        <FieldFormat name='TESTKENNZEICHEN' type='String' delimOptional='y' length='1' strlenInChars='y' codepage='UTF-8'/>
        <FieldFormat name='VERSIONSNUMMER' type='String' delimOptional='y' length='2' strlenInChars='y' codepage='UTF-8'/>
        <FieldFormat name='EDI-REFERENZNUMMER' type='String' delimOptional='y' length='14' strlenInChars='y' codepage='UTF-8'/>
        <StructFormat name='SATZENDE' delim='\n' delimOptional='n'>
        </StructFormat>
    </StructFormat>
    <StructFormat name='FV-ISR-SST-GROUP' delimOptional='n' repeat='150'>
        <StructFormat name='F10-IDENTIFICATION-DATA' delimOptional='y'>
            <FieldFormat name='F000-IDEN' type='String' delimOptional='y' length='2' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F010-TYPE' type='String' delimOptional='y' length='6' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F020-TIME' type='String' delimOptional='y' length='14' strlenInChars='y' codepage='UTF-8'/>
        </StructFormat>
        <StructFormat name='F20-PRODUCTION-DATA' delimOptional='y'>
            <FieldFormat name='F030-WAGO' type='String' delimOptional='y' length='12' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F040-FLAG' type='String' delimOptional='y' length='1' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F050-TRAI' type='String' delimOptional='y' length='17' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F060-CREA' type='String' delimOptional='y' length='8' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F070-WADA' type='String' delimOptional='y' length='12' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F080-FORA' type='String' delimOptional='y' length='4' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F090-FSCO' type='String' delimOptional='y' length='5' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F100-FSDE' type='String' delimOptional='y' length='24' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F110-INSR' type='String' delimOptional='y' length='4' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F120-INSC' type='String' delimOptional='y' length='5' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F130-INSD' type='String' delimOptional='y' length='24' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F140-FRON' type='String' delimOptional='y' length='3' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F150-DERA' type='String' delimOptional='y' length='4' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F160-DSTC' type='String' delimOptional='y' length='5' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F170-DSTD' type='String' delimOptional='y' length='24' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F180-REAL' type='String' delimOptional='y' length='12' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F190-TWEW' type='String' delimOptional='y' length='7' strlenInChars='y' codepage='UTF-8'/>
        </StructFormat>
        <StructFormat name='F30-COMMERCIAL-DATA' delimOptional='y'>
            <StructFormat name='F31-GOODS' delimOptional='y' repeat='4'>
                <StructFormat name='F311-CONTAINER' delimOptional='y' repeat='4'>
                    <FieldFormat name='F240-SHGC' type='String' delimOptional='y' length='6' strlenInChars='y' codepage='UTF-8'/>
                    <FieldFormat name='F250-WEGC' type='String' delimOptional='y' length='7' strlenInChars='y' codepage='UTF-8'/>
                </StructFormat>
                <FieldFormat name='F220-SFGC' type='String' delimOptional='y' length='6' strlenInChars='y' codepage='UTF-8'/>
                <FieldFormat name='F230-NUGC' type='String' delimOptional='y' length='1' strlenInChars='y' codepage='UTF-8'/>
            </StructFormat>
            <StructFormat name='F32-GOODS-DESC' delimOptional='y' repeat='4'>
                <FieldFormat name='F260-WESH' type='String' delimOptional='y' length='7' strlenInChars='y' codepage='UTF-8'/>
                <FieldFormat name='F270-RIDC' type='String' delimOptional='y' length='4' strlenInChars='y' codepage='UTF-8'/>
                <FieldFormat name='F280-RIDG' type='String' delimOptional='y' length='4' strlenInChars='y' codepage='UTF-8'/>
                <FieldFormat name='F290-HAIG' type='String' delimOptional='y' length='4' strlenInChars='y' codepage='UTF-8'/>
                <FieldFormat name='F300-SUID' type='String' delimOptional='y' length='4' strlenInChars='y' codepage='UTF-8'/>
                <FieldFormat name='F310-DALA' type='String' delimOptional='y' length='15' strlenInChars='y' codepage='UTF-8'/>
            </StructFormat>
            <FieldFormat name='F200-TYTR' type='String' delimOptional='y' length='1' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F210-NUGO' type='String' delimOptional='y' length='1' strlenInChars='y' codepage='UTF-8'/>
        </StructFormat>
        <StructFormat name='F40-CONTROL-LABEL-DATA' delimOptional='y'>
            <FieldFormat name='F320-FORA' type='String' delimOptional='y' length='4' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F330-FOST' type='String' delimOptional='y' length='5' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F340-FOSN' type='String' delimOptional='y' length='5' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F350-FOSD' type='String' delimOptional='y' length='24' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F360-FODA' type='String' delimOptional='y' length='8' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F370-CRCO' type='String' delimOptional='y' length='6' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F380-CRDE' type='String' delimOptional='y' length='24' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F390-CECO' type='String' delimOptional='y' length='6' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F400-CEDE' type='String' delimOptional='y' length='24' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F410-CONU' type='String' delimOptional='y' length='6' strlenInChars='y' codepage='UTF-8'/>
        </StructFormat>
        <StructFormat name='F50-ROUTE-DATA' delimOptional='y'>
            <StructFormat name='F51-ROUTE-CODE' delimOptional='y' repeat='6'>
                <FieldFormat name='F430-TRRY' type='String' delimOptional='y' length='4' strlenInChars='y' codepage='UTF-8'/>
                <FieldFormat name='F440-FRON' type='String' delimOptional='y' length='2' strlenInChars='y' codepage='UTF-8'/>
                <FieldFormat name='F450-ORDE' type='String' delimOptional='y' length='1' strlenInChars='y' codepage='UTF-8'/>
            </StructFormat>
            <FieldFormat name='F420-RONU' type='String' delimOptional='y' length='1' strlenInChars='y' codepage='UTF-8'/>
        </StructFormat>
        <StructFormat name='F60-BROKEN-WAGON-DATA' delimOptional='y'>
            <FieldFormat name='F460-DADE' type='String' delimOptional='y' length='25' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F470-STDA' type='String' delimOptional='y' length='5' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F480-SDDA' type='String' delimOptional='y' length='24' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F490-DADA' type='String' delimOptional='y' length='12' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F500-DATY' type='String' delimOptional='y' length='1' strlenInChars='y' codepage='UTF-8'/>
        </StructFormat>
        <StructFormat name='F70-CONS-NOTE-DATA' delimOptional='y'>
            <StructFormat name='F71-CARRIER' delimOptional='y' repeat='20'>
                <FieldFormat name='F710-CACO' type='String' delimOptional='y' length='4' strlenInChars='y' codepage='UTF-8'/>
                <FieldFormat name='F720-CAST' type='String' delimOptional='y' length='1' strlenInChars='y' codepage='UTF-8'/>
                <FieldFormat name='F730-CAPO' type='String' delimOptional='y' length='2' strlenInChars='y' codepage='UTF-8'/>
                <FieldFormat name='F740-CAAC' type='String' delimOptional='y' length='1' strlenInChars='y' codepage='UTF-8'/>
            </StructFormat>
            <StructFormat name='F72-COM-DEST-STAT' delimOptional='y'>
                <FieldFormat name='F750-DSCC' type='String' delimOptional='y' length='2' strlenInChars='y' codepage='UTF-8'/>
                <FieldFormat name='F760-DSSC' type='String' delimOptional='y' length='5' strlenInChars='y' codepage='UTF-8'/>
                <FieldFormat name='F770-DSDE' type='String' delimOptional='y' length='24' strlenInChars='y' codepage='UTF-8'/>
            </StructFormat>
            <FieldFormat name='F780-SWLT' type='String' delimOptional='y' length='1' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F790-CONT' type='String' delimOptional='y' length='1' strlenInChars='y' codepage='UTF-8'/>
        </StructFormat>
        <StructFormat name='F80-LAST-EVENT-DATA' delimOptional='y'>
            <FieldFormat name='F810-LEDD' type='String' delimOptional='y' length='1' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F820-LETY' type='String' delimOptional='y' length='2' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F830-LEDT' type='String' delimOptional='y' length='12' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F840-LECC' type='String' delimOptional='y' length='2' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F850-LESC' type='String' delimOptional='y' length='5' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F860-LEDE' type='String' delimOptional='y' length='24' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F870-LETR' type='String' delimOptional='y' length='6' strlenInChars='y' codepage='UTF-8'/>
        </StructFormat>
        <StructFormat name='F90-TRAIN-SITUATION-DATA' delimOptional='y'>
            <FieldFormat name='F910-TRST' type='String' delimOptional='y' length='1' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F920-MPIM' type='String' delimOptional='y' length='4' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F930-MPCC' type='String' delimOptional='y' length='2' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F940-MPSC' type='String' delimOptional='y' length='5' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F950-MPSD' type='String' delimOptional='y' length='24' strlenInChars='y' codepage='UTF-8'/>
            <FieldFormat name='F960-DTIM' type='String' delimOptional='y' length='6' strlenInChars='y' codepage='UTF-8'/>
        </StructFormat>
        <StructFormat name='SATZENDE' delim='\n' delimOptional='n'>
        </StructFormat>
    </StructFormat>
</MessageFormat>

目标结构应该是:

<?xml version="1.0" encoding="UTF-8"?>
    <WagonStatus_Fplv>
    <K0-HEADER>
        <SATZKOPF-K0 length="2" start="1"/>
        <Externer_Partner length="35" start="3"/>
        <INTERCHANGEREFERENZNUMMER length="14" start="38"/>
        <NACHRICHTENREFERENZNUMMER length="14" start="52"/>
        <NACHRICHTENTYP length="6" start="56">
        <TESTKENNZEICHEN length="1" start="72"/>
        <VERSIONSNUMMER length="2" start="73"/>
        <EDI-REFERENZNUMMER length="14" start="75"/>
        <SATZENDE/>
    </K0-HEADER>
    <FV-ISR-SST-GROUP>
        <F10-IDENTIFICATION-DATA>
            <F000-IDEN length="2" start="89"/>
            <F010-TYPE length="6" start="91"/>
            <F020-TIME length="14" start="97"/>
    ...
</WagonStatus_Fplv>

所以源文档中的每一个FieldFormat都会被映射为一个以length为属性的元素节点,start属性为前面元素节点的长度之和。而<StructFormat name='FV-ISR-SST-GROUP' delimOptional='n' repeat='150'>表示这个结构将重复150次。

直到现在,在 @Martin Hennen 的热心帮助下,我有了这个模板:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:csb="http://www.dbcargo.org/csb" exclude-result-prefixes="#all" version="3.0">
        <xsl:param name="msg" as="xs:string">H0 EVU_DBSRD PVG     Z24 ABF-RF  IR    ExternalPartnerID_uuuuuuuuuuuuuuuuu0202017-03-16-07.27.40.864320NJNJ   M1           80281261300008                        M2 16.03.201707:27:00Z1 H62430  16.03.2017                    16.03.201707:00:00+0027R1 00131800820664780201703154023641201703151159043706346965                                   000    JJ                                                R1 02031800819657480201703154045545201703151159306557346965                                   000    NN                                                </xsl:param>
    <xsl:output method="xml" indent="yes"/>
    <xsl:mode name="unroll" on-no-match="shallow-copy"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="StructFormat[@repeat]" mode="unroll">
        <xsl:variable name="this" select="."/>
        <xsl:choose>
            <xsl:when test="$this/@repeat != '*' ">
                <xsl:for-each select="1 to @repeat">
                    <xsl:choose>
                        <xsl:when test="$this/@delimOptional = 'n' and $this/TagField and contains($msg, $this/TagField)">
                            <xsl:copy select="$this">
                                <xsl:apply-templates select="@* except @repeat, node()" mode="#current"/>
                            </xsl:copy>
                        </xsl:when>
                        <xsl:otherwise>
                            <xsl:copy select="$this">
                                <xsl:apply-templates select="@* except @repeat, node()" mode="#current"/>
                            </xsl:copy>
                        </xsl:otherwise>
                    </xsl:choose>
                </xsl:for-each>
            </xsl:when>
            <xsl:otherwise>
                <xsl:variable name="repeat" select="count(tokenize($msg, $this/TagField/@value)) - 1"/>
                <xsl:for-each select="1 to $repeat">
                    <xsl:copy select="$this">
                        <xsl:apply-templates select="@* except @repeat, node()" mode="#current"/>
                    </xsl:copy>
                </xsl:for-each>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
    <xsl:template match="StructFormat[not(@repeat)]" mode="unroll">
        <xsl:variable name="this" select="."/>
        <xsl:choose>
            <xsl:when test="$this/TagField and not(contains($msg, $this/TagField/@value)) ">
            </xsl:when>
            <xsl:otherwise>
                <xsl:copy select="$this">
                    <xsl:apply-templates select="@* except @repeat, node()" mode="#current"/>
                </xsl:copy>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
    <xsl:template match="FieldFormat[@repeat]" mode="unroll">
        <xsl:variable name="this" select="."/>
        <xsl:for-each select="1 to @repeat">
            <xsl:copy select="$this">
                <xsl:apply-templates select="@* except @repeat, node()" mode="#current"/>
            </xsl:copy>
        </xsl:for-each>
    </xsl:template>
    <xsl:variable name="complete-struct">
        <xsl:apply-templates mode="unroll"/>
    </xsl:variable>
    <xsl:template match="/">
        <xsl:element name="{/MessageFormat/@name}">
            <xsl:apply-templates select="$complete-struct/*"/>
        </xsl:element>
    </xsl:template>
    <xsl:template match="StructFormat">
        <xsl:element name="{@name}">
            <xsl:apply-templates/>
        </xsl:element>
    </xsl:template>
    <xsl:template match="FieldFormat">

        <xsl:element name="{@name}">
        <!--    <xsl:attribute name="start" select="sum(preceding::FieldFormat/@length) /> -->
            <xsl:attribute name="length" select="./@length"/>
        </xsl:element>
    </xsl:template>

</xsl:stylesheet>

问题是,当我尝试计算前面元素的总和时,语句 (<xsl:attribute name="start" select="sum(preceding::FieldFormat/@length) />) 太贵了,以至于 XSLT 引擎 (Saxon 9.8 HE) 没有响应。我是否达到了 XSLT 功能的极限,必须使用 Java 等其他技术来完成此类任务,或者是否有办法避免这种昂贵的前期选择以提高 XSLT 性能?

试试累加器是否提高性能:

<xsl:accumulator name="preceding-length" as="xs:integer" initial-value="0">
    <xsl:accumulator-rule phase="end" match="FieldFormat" select="$value + xs:integer(@length)"/>
</xsl:accumulator>

<xsl:template match="FieldFormat">

    <xsl:element name="{@name}">
        <xsl:attribute name="start" select="accumulator-before('preceding-length') + 1"/>
        <xsl:attribute name="length" select="@length"/>
    </xsl:element>
</xsl:template>

另一种解决方法是使用备忘录功能。 FieldFormatf:totalLength()(f:totalLength(preceding::FieldFormat[1]) + @length),因此如果您在备忘录函数 (xsl:function cache='yes') 中计算它,计算将只涉及退一步,而不是一路扫描回到文件的开头。

另一种解决方案是预先计算前向遍历文档中每个 FieldFormat 元素的累积长度,可能将结果存储在从元素的生成 ID 到总值的映射中。这可以用

之类的东西来完成
<xsl:map>
  <xsl:iterate select="//FieldFormat">
    <xsl:param name="total" select="0"/>
    <xsl:variable name="new-total" select="$total + @length"/>
    <xsl:map-entry key="generate-id()" select="$total"/>
    <xsl:next-iteration>
      <xsl:with-param name="total" select="$new-total"/>
    </xsl:next-iteration>
  </xsl:iterate>
</xsl:map>

您也可以使用 fn:fold-left() 作为 xsl:iterate 的替代方法(尽管不是在 Saxon-HE 中)。

我想这也是您在早期版本的 XSLT 中执行此操作的方式,除了您将数据保存在 XML 结构而不是映射中,并且您将使用递归模板而不是使用 xsl:iterate.