使用 xslt 将来自两个 xml 文件的信息合并为一个文件

Merge info from two xml files in one, using xslt

文件a.xml:

<?xml version="1.0" encoding="UTF-8"?>
<TABLE NAME="pivot.cs">
   <DATA RECORDS="2">
      <RECORD ID="1">
         <INTERNALID>5510</INTERNALID>
         <SOMED>1</SOMED>
         <PEMED>1</PEMED>
         <CODAL>PLACEHOLD</CODAL>
      </RECORD>
      <RECORD ID="2">
         <INTERNALID>5511</INTERNALID>
         <SOMED>1</SOMED>
         <PEMED>1</PEMED>
         <CODAL>PLACEHOLD</CODAL>
      </RECORD>
      <INTERNALID>5537</INTERNALID>
      <SOMED>1</SOMED>
      <PEMED>1</PEMED>
      <CODAL>PLACEHOLD</CODAL>
   </DATA>
</TABLE>

文件b.xml:

    <?xml version="1.0" encoding="UTF-8"?>
<TABLE NAME="ALT.CS">
   <DATA RECORDS="20">
      <RECORD ID="53">
         <RECNO>5510</RECNO>
         <TOBEEXTRACTED>TIM</TOBEEXTRACTED>
      </RECORD>
      <RECORD ID="53">
         <RECNO>5510</RECNO>
         <TOBEEXTRACTED>KLM</TOBEEXTRACTED>
      </RECORD>
      <RECORD ID="54">
         <RECNO>5510</RECNO>
         <TOBEEXTRACTED>KAB</TOBEEXTRACTED>
      </RECORD>
      <RECORD ID="55">
         <RECNO>5511</RECNO>
         <TOBEEXTRACTED>BUS WEE</TOBEEXTRACTED>
      </RECORD>
      <RECORD ID="59">
         <RECNO>5512</RECNO>
      </RECORD>
      <RECORD ID="60">
         <RECNO>5513</RECNO>
         </RECORD>
         <RECORD ID="5511">
            <RECNO>5598</RECNO>
            <TOBEEXTRACTED>FBV</TOBEEXTRACTED>
         </RECORD>
      </RECORD>
   </DATA>
</TABLE>

并且输出文件应该是文件 a.xml,但是在 [] 中附加了 TOBEEXTRACTED 元素文本,如果匹配一两次:

<?xml version="1.0" encoding="UTF-8"?>
<TABLE NAME="pivot.cs">
   <DATA RECORDS="2">
      <RECORD ID="1">
         <INTERNALID>5510</INTERNALID>
         <SOMED>1</SOMED>
         <PEMED>1</PEMED>
         <CODAL>PLACEHOLD</CODAL>
      </RECORD>
      <RECORD ID="2">
         <INTERNALID>5511</INTERNALID>
         <SOMED>1</SOMED>
         <PEMED>1</PEMED>
         <CODAL>PLACEHOLD [BUS WEE]</CODAL>
      </RECORD>
      <INTERNALID>5537</INTERNALID>
      <SOMED>1</SOMED>
      <PEMED>1</PEMED>
      <CODAL>PLACEHOLD</CODAL>
   </DATA>
</TABLE>

此外,如果我们可以将 txt 文件作为输出,那将有很大帮助,该文件将包含以下信息: 来自文件 a.xml、

INTERNALID: 5511 (and all the rest in a normal xml file) was matched.
INTERNALID: 5510 was matched more than two times, so no join took place.
INTERNALID: 5537 did not match
RECNO 5512 did not have a TOBEEXTRACTED element.

这种合并通常可以使用 xsl:for-each-group:

来完成
<xsl:for-each-group select="$doc1//REC, $doc2//REC" group-by="RECNO">
  ...
</xsl:for-each-group>

在正文中,current-group() 使用所需的密钥保存来自两个文件的记录。您可以将它们分开,例如

<xsl:variable name="doc1rec" select="current-group()[(/) is $doc1]"/>
<xsl:variable name="doc2rec" select="current-group()[(/) is $doc2]"/>

如果你理解逻辑(我不理解),那么剩下的处理应该很简单。

如果您使用评论中建议的键,您可以按如下方式引用和匹配元素:

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

    <xsl:param name="doc2">
        <TABLE NAME="ALT.CS">
   <DATA RECORDS="20">
      <RECORD ID="53">
         <RECNO>5510</RECNO>
         <TOBEEXTRACTED>TIM</TOBEEXTRACTED>
      </RECORD>
      <RECORD ID="53">
         <RECNO>5510</RECNO>
         <TOBEEXTRACTED>KLM</TOBEEXTRACTED>
      </RECORD>
      <RECORD ID="54">
         <RECNO>5510</RECNO>
         <TOBEEXTRACTED>KAB</TOBEEXTRACTED>
      </RECORD>
      <RECORD ID="55">
         <RECNO>5511</RECNO>
         <TOBEEXTRACTED>BUS WEE</TOBEEXTRACTED>
      </RECORD>
      <RECORD ID="59">
         <RECNO>5512</RECNO>
      </RECORD>
      <RECORD ID="60">
         <RECNO>5513</RECNO>
         </RECORD>
         <RECORD ID="5511">
            <RECNO>5598</RECNO>
            <TOBEEXTRACTED>FBV</TOBEEXTRACTED>
         </RECORD>

   </DATA>
</TABLE>
    </xsl:param>

    <xsl:key name="ref" match="DATA/RECORD[TOBEEXTRACTED]" use="RECNO"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="DATA/RECORD[key('ref', INTERNALID, $doc2)]/CODAL">
        <xsl:copy>
            <xsl:apply-templates select="node(), key('ref', ../INTERNALID, $doc2)/TOBEEXTRACTED"/>  
        </xsl:copy>
    </xsl:template>

    <xsl:template match="DATA/RECORD[not(key('ref', INTERNALID, $doc2))]"/>

    <xsl:template match="TOBEEXTRACTED">
        <xsl:value-of select="concat(' [', ., ']')"/>
    </xsl:template>

</xsl:transform>

这给出了您在 http://xsltransform.net/a9Giwy 上发布的输出。在那里我使用了带有内联内容的 xsl:param name="doc2" 但你当然可以使用 <xsl:param name="doc2" select="doc('fileb.xml')"/> 代替。

在编辑中,问题被额外标记为 我也尝试使用该版本的 xsl:merge 指令来实现它:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    exclude-result-prefixes="xs math"
    version="3.0">

    <xsl:param name="doc2-uri" as="xs:string" select="'test201705120102.xml'"/>

    <xsl:mode on-no-match="shallow-copy"/>

    <xsl:output indent="yes"/>

    <xsl:template match="TABLE/DATA">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:merge>
                <xsl:merge-source name="internal" select="RECORD" >
                    <xsl:merge-key select="INTERNALID"/>
                </xsl:merge-source>
                <xsl:merge-source name="recno" select="doc($doc2-uri)//RECORD">
                    <xsl:merge-key select="RECNO"/>
                </xsl:merge-source>
                <xsl:merge-action>
                    <xsl:if test="current-merge-group('internal') and current-merge-group('recno')">
                        <xsl:copy>
                            <xsl:copy-of select="@*, * except CODAL"/>
                            <CODAL>
                                <xsl:value-of select="CODAL, current-merge-group('recno')/TOBEEXTRACTED/('[' || . || ']')"/>
                            </CODAL>
                        </xsl:copy>
                    </xsl:if>
                </xsl:merge-action>
            </xsl:merge>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>