XSL 将文件从一个节点拆分到另一个节点

XSL Split file from node to node

我需要将一个 HTML 文件拆分成几个 HTML 文件,使用 h1 节点作为文件分隔符。
示例:

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta http-equiv="Content-Style-Type" content="text/css" />
        <title>Test</title>
        <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
    </head>
    <body>
        <div>
            <p><span>This is my frontpage</span></p>
            <div><img src="images/frontpage.png" width="100" height="50" style="border:none" /></div>
        </div>
        <div>
            <h1> Title 1 </h1><p> some blabla for title_1 </p>
            <h2> Title 1.1 </h2><p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50"/>
            <h1> Title 2 </h1><p> some blabla for title_2 </p>
        </div>
        <div>
            <p> other blabla </p>
            <h1> Title 3 </h1><p> some blabla for title_3 </p>
        </div>
    </body>
</html>

我想要 4 个输出。

frontpage.html:

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta http-equiv="Content-Style-Type" content="text/css" />
        <title>Test</title>
        <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
    </head>
    <body>
        <div>
            <p><span>This is my frontpage</span></p>
            <div><img src="images/frontpage.png" width="100" height="50" style="border:none" /></div>
        </div>
    </body>
</html>

output1.html:

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta http-equiv="Content-Style-Type" content="text/css" />
        <title>Test</title>
        <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
    </head>
    <body>
        <div>
            <h1> Title 1 </h1><p> some blabla for title_1 </p>
            <h2> Title 1.1 </h2><p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50"/>
        </div>
    </body>
</html>

output2.html:

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta http-equiv="Content-Style-Type" content="text/css" />
        <title>Test</title>
        <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
    </head>
    <body>
        <div>
            <h1> Title 2 </h1><p> some blabla for title_2 </p>
        </div>
        <div>
            <p> other blabla </p>
        </div>
    </body>
</html>

output3.html

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta http-equiv="Content-Style-Type" content="text/css" />
        <title>Test</title>
        <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
    </head>
    <body>
        <div>
            <h1> Title 3 </h1><p> some blabla for title_3 </p>
        </div>
    </body>
</html>

我将不胜感激解决此问题的任何想法。

PS : 我使用 XSLT 2.0 和 Saxon 8

请注意,Saxon 8 已有数年历史,8.9 之前的版本未实现 XSLT 2.0 规范,而是更早的草案。

以下是使用 Saxon 9.6 测试的 XSLT 2.0 样式表:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs">

<xsl:output method="html" version="4.01" indent="yes"/>

<xsl:template match="/">
  <xsl:for-each-group select="//h1 | //text()[not(ancestor::h1)] | //*[not(*) and not(ancestor::h1)]" group-starting-with="h1">
    <xsl:variable name="copy" select="current-group()"/>
    <xsl:variable name="ancestors" select="$copy/ancestor::*"/>
    <xsl:variable name="filename" select="if (not(self::h1)) then 'frontpage.html' else concat('output', position() - 1, '.html')"/>
    <xsl:result-document href="{$filename}">
      <xsl:apply-templates select="/*">
        <xsl:with-param name="copy" select="$copy"/>
        <xsl:with-param name="ancestors" select="$ancestors"/>
      </xsl:apply-templates>
    </xsl:result-document>
  </xsl:for-each-group>
</xsl:template>

<xsl:template match="node()">
  <xsl:param name="copy"/>
  <xsl:param name="ancestors"/>
  <xsl:choose>
    <xsl:when test="$copy[. is current()]">
      <xsl:copy-of select="."/>
    </xsl:when>
    <xsl:when test="$ancestors[. is current()]">
      <xsl:copy>
        <xsl:copy-of select="@*"/>
        <xsl:apply-templates>
          <xsl:with-param name="copy" select="$copy"/>
          <xsl:with-param name="ancestors" select="$ancestors"/>
        </xsl:apply-templates>
      </xsl:copy>
    </xsl:when>
  </xsl:choose>
</xsl:template>

<xsl:template match="head">
  <xsl:copy-of select="."/>
</xsl:template>

</xsl:stylesheet>

应用于输入文件时

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta http-equiv="Content-Style-Type" content="text/css" />
        <title>Test</title>
        <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
    </head>
    <body>
        <div>
            <p><span>This is my frontpage</span></p>
            <div><img src="images/frontpage.png" width="100" height="50" style="border:none" /></div>
        </div>
        <div>
            <h1> Title 1 </h1><p> some blabla for title_1 </p>
            <h2> Title 1.1 </h2><p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50"/>
            <h1> Title 2 </h1><p> some blabla for title_2 </p>
        </div>
        <div>
            <p> other blabla </p>
            <h1> Title 3 </h1><p> some blabla for title_3 </p>
        </div>
    </body>
</html>

它创建四个输出文件

<html>

   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">


      <meta http-equiv="Content-Style-Type" content="text/css">

      <title>Test</title>
      <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
      </head>

   <body>

      <div>

         <p><span>This is my frontpage</span></p>

         <div><img src="images/frontpage.png" width="100" height="50" style="border:none"></div>

      </div>

      <div>

      </div>
   </body>
</html>

<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">


      <meta http-equiv="Content-Style-Type" content="text/css">

      <title>Test</title>
      <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
      </head>
   <body>
      <div>
         <h1> Title 1 </h1>
         <p> some blabla for title_1 </p>

         <h2> Title 1.1 </h2>
         <p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50">

      </div>
   </body>
</html>

<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">


      <meta http-equiv="Content-Style-Type" content="text/css">

      <title>Test</title>
      <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
      </head>
   <body>
      <div>
         <h1> Title 2 </h1>
         <p> some blabla for title_2 </p>

      </div>

      <div>

         <p> other blabla </p>

      </div>
   </body>
</html>

<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">


      <meta http-equiv="Content-Style-Type" content="text/css">

      <title>Test</title>
      <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
      </head>
   <body>
      <div>
         <h1> Title 3 </h1>
         <p> some blabla for title_3 </p>

      </div>

   </body>

</html>

所以我认为样式表会根据需要拆分节点并创建正确的文件内容,您需要尝试使用白色 space 剥离和缩进。