如何使用 XSLT 将富文本 XML 列表呈现为格式良好的 HTML

How to render richtext XML lists as well-formed HTML using XSLT

我有 XML 数据是从遗留的 Lotus Notes 应用程序中提取的,并且嵌入了富文本格式。我很难将富文本列表呈现为格式正确 HTML。

问题是每个列表都没有结束标记来指示列表何时结束。但是,每个列表都有一个带有唯一 ID 的开始标记,用于指示列表的开始,并且每个列表项都有一个与列表 ID 匹配的属性。富文本有很多噪音(垃圾段落),经常散布在合法的列表项之间,需要忽略。

我有来自@Tim-C 的 启发的 XSLT,但它不起作用。

这是XML:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="NoBullet6.xslt"?>
<document>
    <item name="Unordered list">
        <richtext>
            <pardef/>
            <par def="20">
                <run>This is the first </run>
                <run>paragraph of the preamble.</run>
            </par>
            <par>
                <run>This is the second paragraph of the </run>
                <run>preamble.</run>
            </par>
            <pardef id="21" list="unordered"/>
            <par def="21">
                <run>This is the </run>
                <run>first bullet.</run>
            </par>
            <par def="20">
                <run/>
                <!-- This is an empty paragraph/garbage data -->
            </par>
            <par>
                <run>This is the second </run>
                <run>bullet.</run>
            </par>
            <par def="20">
                <run>This is the first </run>
                <run>paragraph of the conclusion.</run>
            </par>
            <par>
                <run>This is the second paragraph of the </run>
                <run>conclusion.</run>
            </par>
        </richtext>
    </item>
    <item name="Ordered list">
        <richtext>
            <pardef/>
            <par def="20">
                <run>This is the first </run>
                <run>paragraph of the preamble.</run>
            </par>
            <par>
                <run>This is the second paragraph of the </run>
                <run>preamble.</run>
            </par>
            <pardef id="46" list="ordered"/>
            <par def="46">
                <run>This is the </run>
                <run>first numbered item.</run>
            </par>
            <par def="47">
                <run/>
                <!-- This is an empty paragraph/garbage data -->
            </par>
            <par def="46">
                <run>This is the another </run>
                <run>numbered item.</run>
            </par>
            <par def="20">
                <run>This is the first </run>
                <run>paragraph of the conclusion.</run>
            </par>
            <par>
                <run>This is the second paragraph of the </run>
                <run>conclusion.</run>
            </par>
        </richtext>
    </item>
</document>

这是期望的输出:

<html>
  <body>
     <table border="1">
        <tr>
           <td>Unordered list</td>
           <td>
              <p>This is the first paragraph of the preamble.</p>
              <p>This is the second paragraph of the preamble.</p>
              <ul>
                 <li>This is the first bullet.</li>
                 <li>This is the second bullet.</li>
              </ul>
              <p>This is the first paragraph of the conclusion.</p>
              <p>This is the second paragraph of the conclusion.</p>
           </td>
        </tr>
        <tr>
           <td>Ordered list</td>
           <td>
              <p>This is the first paragraph of the preamble.</p>
              <p>This is the second paragraph of the preamble.</p>
              <ol>
                 <li>This is the first numbered item.</li>
                 <li>This is the another numbered item.</li>
              </ol>
              <p>This is the first paragraph of the conclusion.</p>
              <p>This is the second paragraph of the conclusion.</p>
           </td>
        </tr>
     </table>
  </body>

这是 XSLT:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output indent="yes"/>


    <xsl:key name="pars" match="par[not(@def)]" use="generate-id(preceding-sibling::par[@def][1])" />


    <xsl:template match="/*">
        <html>
            <body>
                <table border="1">
                    <xsl:apply-templates />
                </table>
            </body>
        </html>
    </xsl:template>

    <xsl:template match="item">
        <tr>
            <td><xsl:value-of select="@name"/></td>
            <td>
                <xsl:apply-templates select="richtext/par[@def]" />
            </td>
        </tr>
    </xsl:template>

    <xsl:template match="par[@def]">
        <xsl:variable name="listType" select="preceding-sibling::*[1][self::pardef]/@list" />
        <xsl:variable name="group" select="self::* | key('pars', generate-id())" />
        <xsl:choose>
            <xsl:when test="$listType = 'unordered'">    
                <ul>
                    <xsl:apply-templates select="$group" mode="list"/>
                </ul>
            </xsl:when>
            <xsl:when test="$listType = 'ordered'">    
                <ol>
                    <xsl:apply-templates select="$group"  mode="list"/>
                </ol>
            </xsl:when>
            <xsl:otherwise>
                <xsl:apply-templates select="$group" mode="para" />   
            </xsl:otherwise>     
        </xsl:choose>   
    </xsl:template>

    <xsl:template match="par" mode="list">
        <li>
            <xsl:value-of select="run" separator=""/>
        </li>  
    </xsl:template>

    <xsl:template match="par" mode="para">
        <p>
            <xsl:value-of select="run" separator=""/>
        </p>  
    </xsl:template>
</xsl:stylesheet>

当您使用 XSLT 2.0 时,您实际上可以在此处使用 xsl:for-each-group,这可能会简化事情。

您可以根据 def 属性(忽略 "empty" 元素)对 par 元素进行分组,或者在没有 def 属性的情况下,但是def 前面第一个(非空)兄弟的属性。

 <xsl:for-each-group select="par[run[normalize-space()]]" 
                     group-adjacent="if (@def) then @def else preceding-sibling::par[run[normalize-space()]][@def][1]/@def">

代替groups变量,您可以使用函数current-group()获取当前组。

试试这个 XSLT

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output indent="yes"/>

    <xsl:template match="/*">
        <html>
            <body>
                <table border="1">
                    <xsl:apply-templates />
                </table>
            </body>
        </html>
    </xsl:template>

    <xsl:template match="item">
        <tr>
            <td><xsl:value-of select="@name"/></td>
            <td>
                <xsl:apply-templates select="richtext" />
            </td>
        </tr>
    </xsl:template>

    <xsl:template match="richtext">
        <xsl:for-each-group select="par[run[normalize-space()]]" group-adjacent="if (@def) then @def else preceding-sibling::par[run[normalize-space()]][@def][1]/@def">
            <xsl:variable name="listType" select="preceding-sibling::*[1][self::pardef]/@list" />
            <xsl:choose>
                <xsl:when test="$listType = 'unordered'">    
                    <ul>
                        <xsl:apply-templates select="current-group()" mode="list"/>
                    </ul>
                </xsl:when>
                <xsl:when test="$listType = 'ordered'">    
                    <ol>
                        <xsl:apply-templates select="current-group()"  mode="list"/>
                    </ol>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:apply-templates select="current-group()" mode="para" />   
                </xsl:otherwise>     
            </xsl:choose>   
        </xsl:for-each-group>
    </xsl:template>

    <xsl:template match="par" mode="list">
        <li>
            <xsl:value-of select="run" separator=""/>
        </li>  
    </xsl:template>

    <xsl:template match="par" mode="para">
        <p>
            <xsl:value-of select="run" separator=""/>
        </p>  
    </xsl:template>
</xsl:stylesheet>