如何使用 XSLT 3.0 从命令行解析 XML 文件中的 XInclude 指令

How to resolve XInclude instructions in a XML file from command line with XSLT 3.0

我们的 XML 数据存储在单独的文件中,因此人员可以单独处理简单的模块。单独的文件被组合成一个主文件以供进一步处理。目前我正在 Oxygen XML 编辑器的 IDE 中执行此操作。为了简化流程,我想在没有此 IDE 的情况下从命令行执行此操作。如何使用 Saxon HE 从命令行解析 XInclude 语句(如果可能)?

我试过这样的命令:

java -jar saxon9he.jar -xi:on -s:main.xml -xsl:assemble.xslt -o:master.xml -t

并得到以下错误代码:

Saxon-HE 9.9.1.4J from Saxonica
Java version 1.8.0_191
Stylesheet compilation time: 361.152836ms
Processing file:/u:/Wolke/xml/resolve-xi/main.xml
Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
Building tree for file:/u:/Wolke/xml/resolve-xi/main.xml using class net.sf.saxon.tree.tiny.TinyBuilder
Exception in thread "main" java.lang.WhosebugError
        at java.security.AccessController.doPrivileged(Native Method)
        at com.sun.org.apache.xerces.internal.utils.SecuritySupport.getContextClassLoader(Unknown Source)
        at com.sun.org.apache.xerces.internal.utils.ObjectFactory.findClassLoader(Unknown Source)
        at com.sun.org.apache.xerces.internal.utils.ObjectFactory.newInstance(Unknown Source)
        at com.sun.org.apache.xerces.internal.xinclude.XIncludeHandler.handleIncludeElement(Unknown Source)
        at com.sun.org.apache.xerces.internal.xinclude.XIncludeHandler.emptyElement(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
[and many more lines]

Saxonica 关于 xi:on 参数的文档说:"Apply XInclude processing to all input XML documents (including schema and stylesheet modules as well as source documents). This currently only works when documents are parsed using the Xerces parser, which is the default in JDK 1.5 and later." (https://www.saxonica.com/documentation9.5/using-xsl/commandline.html) -- 不确定,这是什么意思。

主 XML 文件:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>

<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <teiHeader xml:id="header">
        <fileDesc>
            <titleStmt><title>Trying to make XInclude work</title></titleStmt>
            <publicationStmt><p>Sample data for Whosebug question</p></publicationStmt>
            <sourceDesc><p>Just made up</p></sourceDesc>
        </fileDesc>
    </teiHeader>
    <text>
        <body>
            <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="file1.xml" xpointer="content-p1"/>
            <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="file2.xml" xpointer="content-p2"/>
            <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="file3.xml" xpointer="content-p3"/>
        </body>
    </text>
</TEI>

XML 个组件文件:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml"
    schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml-stylesheet type="text/css" href="../css/mm-xml.css"?>

<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="main.xml" xpointer="header"/>
    <text>
        <body>
            <div type="page" xml:id="content-p1">
                <p> Integer sit amet justo porta nisl porta aliquet in a justo.</p>
            </div>
        </body>
    </text>
</TEI>
<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml"
    schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml-stylesheet type="text/css" href="../css/mm-xml.css"?>

<TEI xmlns="http://www.tei-c.org/ns/1.0">
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="main.xml" xpointer="header"/>
   <text>
      <body>
         <div type="page" xml:id="content-p2">
            <p>Quisque gravida venenatis varius.</p>
         </div>
      </body>
   </text>
</TEI>
<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml"
    schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml-stylesheet type="text/css" href="../css/mm-xml.css"?>

<TEI xmlns="http://www.tei-c.org/ns/1.0">
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="main.xml" xpointer="header"/>
   <text>
      <body>
         <div type="page" xml:id="content-p3">
            <p>Nullam nisi lacus, malesuada vel eros porta, dictum finibus mauris.</p>
         </div>
      </body>
   </text>
</TEI>

XSLT:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    exclude-result-prefixes="xs math"
    version="3.0">

    <xsl:template match="node() | @*">
        <xsl:copy>
            <xsl:apply-templates select="node() | @*"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

我需要的输出(就像 Oxygen IDE 创建的):

<?xml version="1.0" encoding="UTF-8"?><?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?><?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <teiHeader xml:id="header">
        <fileDesc>
            <titleStmt>
                <title>Trying to make XInclude work</title>
            </titleStmt>
            <publicationStmt>
                <p>Sample data for Whosebug question</p>
            </publicationStmt>
            <sourceDesc>
                <p>Just made up</p>
            </sourceDesc>
        </fileDesc>
    </teiHeader>
    <text>
        <body>
            <div type="page" xml:id="content-p1" xml:base="file1.xml">
                <p> Integer sit amet justo porta nisl porta aliquet in a justo.</p>
            </div>
            <div type="page" xml:id="content-p2" xml:base="file2.xml">
                <p>Quisque gravida venenatis varius.</p>
            </div>
            <div type="page" xml:id="content-p3" xml:base="file3.xml">
                <p>Nullam nisi lacus, malesuada vel eros porta, dictum finibus mauris.</p>
            </div>
        </body>
    </text>
</TEI>

根据我们的评论交流和您从 oXygen 支持人员那里得到的建议,使用 oXygen 的 Xerces 补丁版本(可在 https://mvnrepository.com/artifact/com.oxygenxml/oxygen-patched-xerces/21.1.0.2 获得)与 Saxon 9.9 HE 一起使用似乎可以启用 xpointer 基于 XInclude 来自 xml:id 个属性:

java -cp 'oxygen-patched-xerces-21.1.0.2.jar;saxon9he.jar' net.sf.saxon.Transform -t -s:input.xml -xsl:sheet.xsl -xi:on

这是我在 Windows 10 Powershell window 中使用和测试过的命令行,具体取决于您可能需要的平台和命令行 shell -cp 参数的不同引号字符以及此处列出的不同 jar 文件之间的不同项目分隔符。