通过 XSL 简单展开 HTML 文件

Simple unflattening HTML file through XSL

我四处寻找通过 XSL 的非扁平化程序,但 none 其中确实对我有用,尽管我相信我的情况非常简单。我有一个 HTML 的集合,总是相同的结构,我想通过 XSL 转换来展开。基本上,它是关于将 <p class='subtitle'> 之后的所有元素封装到 <div> 元素中,直到下一个 <p class='subtitle'>,并且——理想情况下! – 仍然对 dividually 中的元素应用转换,但这是可选的(见下文)。

源文件看起来像:

[...some stuff on the page]
<p class='header'>Some text</p>
<p class='subtitle'>Subtitle 1</p>
<p class='content'>First paragraph of part 1, with some <span>Inside</span> and other 
nested elements, on multiple levels</p>
<ul>a list with <li> inside</ul>
<p class='content'>Second paragraph of part 1</p>
<img src='xyz.jpg'/>
<p class='content'>Third paragraph of part 1</p>
<p class='subtitle'>Subtitle 2</p>
<p class='content'>First paragraph of part 2</p>
<p class='content'>Second paragraph of part 2</p>
<p class='subtitle'>Subtitle 3 
[and so on…]

我想把它变成:

<div n='section1'>
    <head>Subtitle 1</head>
    <p>First paragraph of part 1, with some <span>Inside</span> and other and other 
     nested elements, on multiple levels</p>
    <ul>a list with <li> inside</ul>
    <p>Second paragraph of part 1</p>
    <picture source='xyz.jpg'/>
    <p>Third paragraph of part 1</p>
</div>
<div n="section2">
    <head>Subtitle 2</head>
    <p>First paragraph of part 2</p>
    <p>Second paragraph of part 2</p>
</div>
<div n="Section 3">
    <head>Subtitle 3</head>
    [and so on…]

我找不到解决这个问题的方法。此外,如果第一步只是展开 HTML 文件(严格复制 div 中的元素而不进行转换),这已经很了不起了。

提前致谢!

这是一个经典的位置分组问题。入门指南:

<xsl:template match="body">
  <body>
    <xsl:for-each-group select="*" group-starting-with="p[@class='subtitle']">
      <xsl:choose>
        <xsl:when test="@class="subtitle">
          <div n="section{position()}">
            <head>{.}</head>
            <xsl:apply-templates select="tail(current-group())"/>
          </div>
        </xsl:when>
        <xsl:otherwise>
           <xsl:apply-templates select="current-group()"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:for-each-group>
  </body>
</xsl:template>

请注意,xsl:for-each-group 需要 XSLT 2.0 或更高版本。使用 XSLT 1.0 要困难得多。