XSL 将文件从一个节点拆分到另一个节点
XSL Split file from node to node
我需要将一个 HTML 文件拆分成几个 HTML 文件,使用 h1 节点作为文件分隔符。
示例:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<p><span>This is my frontpage</span></p>
<div><img src="images/frontpage.png" width="100" height="50" style="border:none" /></div>
</div>
<div>
<h1> Title 1 </h1><p> some blabla for title_1 </p>
<h2> Title 1.1 </h2><p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50"/>
<h1> Title 2 </h1><p> some blabla for title_2 </p>
</div>
<div>
<p> other blabla </p>
<h1> Title 3 </h1><p> some blabla for title_3 </p>
</div>
</body>
</html>
我想要 4 个输出。
frontpage.html:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<p><span>This is my frontpage</span></p>
<div><img src="images/frontpage.png" width="100" height="50" style="border:none" /></div>
</div>
</body>
</html>
output1.html:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<h1> Title 1 </h1><p> some blabla for title_1 </p>
<h2> Title 1.1 </h2><p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50"/>
</div>
</body>
</html>
output2.html:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<h1> Title 2 </h1><p> some blabla for title_2 </p>
</div>
<div>
<p> other blabla </p>
</div>
</body>
</html>
output3.html
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<h1> Title 3 </h1><p> some blabla for title_3 </p>
</div>
</body>
</html>
我将不胜感激解决此问题的任何想法。
PS : 我使用 XSLT 2.0 和 Saxon 8
请注意,Saxon 8 已有数年历史,8.9 之前的版本未实现 XSLT 2.0 规范,而是更早的草案。
以下是使用 Saxon 9.6 测试的 XSLT 2.0 样式表:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs">
<xsl:output method="html" version="4.01" indent="yes"/>
<xsl:template match="/">
<xsl:for-each-group select="//h1 | //text()[not(ancestor::h1)] | //*[not(*) and not(ancestor::h1)]" group-starting-with="h1">
<xsl:variable name="copy" select="current-group()"/>
<xsl:variable name="ancestors" select="$copy/ancestor::*"/>
<xsl:variable name="filename" select="if (not(self::h1)) then 'frontpage.html' else concat('output', position() - 1, '.html')"/>
<xsl:result-document href="{$filename}">
<xsl:apply-templates select="/*">
<xsl:with-param name="copy" select="$copy"/>
<xsl:with-param name="ancestors" select="$ancestors"/>
</xsl:apply-templates>
</xsl:result-document>
</xsl:for-each-group>
</xsl:template>
<xsl:template match="node()">
<xsl:param name="copy"/>
<xsl:param name="ancestors"/>
<xsl:choose>
<xsl:when test="$copy[. is current()]">
<xsl:copy-of select="."/>
</xsl:when>
<xsl:when test="$ancestors[. is current()]">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates>
<xsl:with-param name="copy" select="$copy"/>
<xsl:with-param name="ancestors" select="$ancestors"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:when>
</xsl:choose>
</xsl:template>
<xsl:template match="head">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
应用于输入文件时
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<p><span>This is my frontpage</span></p>
<div><img src="images/frontpage.png" width="100" height="50" style="border:none" /></div>
</div>
<div>
<h1> Title 1 </h1><p> some blabla for title_1 </p>
<h2> Title 1.1 </h2><p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50"/>
<h1> Title 2 </h1><p> some blabla for title_2 </p>
</div>
<div>
<p> other blabla </p>
<h1> Title 3 </h1><p> some blabla for title_3 </p>
</div>
</body>
</html>
它创建四个输出文件
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<p><span>This is my frontpage</span></p>
<div><img src="images/frontpage.png" width="100" height="50" style="border:none"></div>
</div>
<div>
</div>
</body>
</html>
和
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<h1> Title 1 </h1>
<p> some blabla for title_1 </p>
<h2> Title 1.1 </h2>
<p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50">
</div>
</body>
</html>
和
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<h1> Title 2 </h1>
<p> some blabla for title_2 </p>
</div>
<div>
<p> other blabla </p>
</div>
</body>
</html>
和
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<h1> Title 3 </h1>
<p> some blabla for title_3 </p>
</div>
</body>
</html>
所以我认为样式表会根据需要拆分节点并创建正确的文件内容,您需要尝试使用白色 space 剥离和缩进。
我需要将一个 HTML 文件拆分成几个 HTML 文件,使用 h1 节点作为文件分隔符。
示例:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<p><span>This is my frontpage</span></p>
<div><img src="images/frontpage.png" width="100" height="50" style="border:none" /></div>
</div>
<div>
<h1> Title 1 </h1><p> some blabla for title_1 </p>
<h2> Title 1.1 </h2><p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50"/>
<h1> Title 2 </h1><p> some blabla for title_2 </p>
</div>
<div>
<p> other blabla </p>
<h1> Title 3 </h1><p> some blabla for title_3 </p>
</div>
</body>
</html>
我想要 4 个输出。
frontpage.html:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<p><span>This is my frontpage</span></p>
<div><img src="images/frontpage.png" width="100" height="50" style="border:none" /></div>
</div>
</body>
</html>
output1.html:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<h1> Title 1 </h1><p> some blabla for title_1 </p>
<h2> Title 1.1 </h2><p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50"/>
</div>
</body>
</html>
output2.html:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<h1> Title 2 </h1><p> some blabla for title_2 </p>
</div>
<div>
<p> other blabla </p>
</div>
</body>
</html>
output3.html
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<h1> Title 3 </h1><p> some blabla for title_3 </p>
</div>
</body>
</html>
我将不胜感激解决此问题的任何想法。
PS : 我使用 XSLT 2.0 和 Saxon 8
请注意,Saxon 8 已有数年历史,8.9 之前的版本未实现 XSLT 2.0 规范,而是更早的草案。
以下是使用 Saxon 9.6 测试的 XSLT 2.0 样式表:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs">
<xsl:output method="html" version="4.01" indent="yes"/>
<xsl:template match="/">
<xsl:for-each-group select="//h1 | //text()[not(ancestor::h1)] | //*[not(*) and not(ancestor::h1)]" group-starting-with="h1">
<xsl:variable name="copy" select="current-group()"/>
<xsl:variable name="ancestors" select="$copy/ancestor::*"/>
<xsl:variable name="filename" select="if (not(self::h1)) then 'frontpage.html' else concat('output', position() - 1, '.html')"/>
<xsl:result-document href="{$filename}">
<xsl:apply-templates select="/*">
<xsl:with-param name="copy" select="$copy"/>
<xsl:with-param name="ancestors" select="$ancestors"/>
</xsl:apply-templates>
</xsl:result-document>
</xsl:for-each-group>
</xsl:template>
<xsl:template match="node()">
<xsl:param name="copy"/>
<xsl:param name="ancestors"/>
<xsl:choose>
<xsl:when test="$copy[. is current()]">
<xsl:copy-of select="."/>
</xsl:when>
<xsl:when test="$ancestors[. is current()]">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates>
<xsl:with-param name="copy" select="$copy"/>
<xsl:with-param name="ancestors" select="$ancestors"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:when>
</xsl:choose>
</xsl:template>
<xsl:template match="head">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
应用于输入文件时
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<p><span>This is my frontpage</span></p>
<div><img src="images/frontpage.png" width="100" height="50" style="border:none" /></div>
</div>
<div>
<h1> Title 1 </h1><p> some blabla for title_1 </p>
<h2> Title 1.1 </h2><p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50"/>
<h1> Title 2 </h1><p> some blabla for title_2 </p>
</div>
<div>
<p> other blabla </p>
<h1> Title 3 </h1><p> some blabla for title_3 </p>
</div>
</body>
</html>
它创建四个输出文件
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<p><span>This is my frontpage</span></p>
<div><img src="images/frontpage.png" width="100" height="50" style="border:none"></div>
</div>
<div>
</div>
</body>
</html>
和
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<h1> Title 1 </h1>
<p> some blabla for title_1 </p>
<h2> Title 1.1 </h2>
<p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50">
</div>
</body>
</html>
和
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<h1> Title 2 </h1>
<p> some blabla for title_2 </p>
</div>
<div>
<p> other blabla </p>
</div>
</body>
</html>
和
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<h1> Title 3 </h1>
<p> some blabla for title_3 </p>
</div>
</body>
</html>
所以我认为样式表会根据需要拆分节点并创建正确的文件内容,您需要尝试使用白色 space 剥离和缩进。