使用 XSLT 函数删除除允许的标签之外的所有 html 标签

Question

我正在尝试清理我们使用 XSLT.I 从 rss 提要中获取的一些数据，想删除除 p 标签之外的所有标签。

 Cows are kool.<p>The <i>milk</i> <b>costs</b> .99.</p>

我对如何在 1.0 或 2.0 中使用 XSLT 解决这个问题几乎没有疑问。

1)我看过这个例子https://maulikdhorajia.blogspot.in/2011/06/removing-html-tags-using-xslt.html

但我需要 p 标签存在，为此我需要使用 regex.Can 我们使用 string-before-match 函数并在类似的 way.This 函数中执行我认为不是存在于 xpath 中。

2) 我知道替换函数不能用于此，因为它需要一个字符串，如果我们传递任何节点，它会提取内容，然后将其传递给函数，在这种情况下，删除标签的目的就失败了。

我有点困惑，因为在这个答案中，使用了替换。

3)我正在使用 xslt 在 nginx 服务器中执行此操作。

请在下面找到我们在 rss 提要的 body 标签中获得的示例输入。

<p>The Supreme Court issued on Friday a bailable warrant against sitting Calcutta high court justice CS Karnan, an unprecedented order in a bitter confrontation between the judge and the top court.</p><p>A seven-judge bench headed by Chief Justice of India JS Khehar issued the order directing Karnan’s presence on <h2>March 31</h2> because the judge ignored an earlier court order summoning him.<i>Justice Karnan</i> had to appear</p>

更新：我也在为此寻找一个 xslt 函数

Answer 1

假设您可以使用 XSLT 2.0，那么您可以将 David Carlisle 的 HTML 解析器 (https://github.com/davidcarlisle/web-xslt/blob/master/htmlparse/htmlparse.xsl) 应用于 body 元素的内容，然后以一种模式处理生成的节点去除每个元素，但 p 个元素：

<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
    xmlns:d="data:,dpc"
    xmlns:xhtml="http://www.w3.org/1999/xhtml"
    exclude-result-prefixes="d xhtml">

    <xsl:import href="htmlparse-by-dcarlisle.xsl"/>

    <xsl:template match="@*|node()" mode="#default strip">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()" mode="#current"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="body">
        <xsl:copy>
            <xsl:apply-templates select="d:htmlparse(., '', true())" mode="strip"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*[not(self::p)]" mode="strip">
        <xsl:apply-templates/>
    </xsl:template>

</xsl:transform>

为输入

<rss>
    <entry>
        <body><![CDATA[<p>The Supreme Court issued on Friday a bailable warrant against sitting Calcutta high court justice CS Karnan, an unprecedented order in a bitter confrontation between the judge and the top court.</p><p>A seven-judge bench headed by Chief Justice of India JS Khehar issued the order directing Karnan’s presence on <h2>March 31</h2> because the judge ignored an earlier court order summoning him.<i>Justice Karnan</i> had to appear</p>]]></body>
    </entry>
</rss>

这给了

<rss>
    <entry>
        <body><p>The Supreme Court issued on Friday a bailable warrant against sitting Calcutta high court justice CS Karnan, an unprecedented order in a bitter confrontation between the judge and the top court.</p><p>A seven-judge bench headed by Chief Justice of India JS Khehar issued the order directing Karnan’s presence on March 31 because the judge ignored an earlier court order summoning him.Justice Karnan had to appear</p></body>
    </entry>
</rss>

如果输入没有被转义而是作为 XML 包含在输入中那么你不需要解析它但可以将模式应用于内容：

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

    <xsl:template match="@*|node()" mode="#default strip">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()" mode="#current"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="body">
        <xsl:copy>
            <xsl:apply-templates select="node()" mode="strip"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*[not(self::p)]" mode="strip">
        <xsl:apply-templates/>
    </xsl:template>

</xsl:transform>

http://xsltransform.net/gWEamMc/1

使用 XSLT 函数删除除允许的标签之外的所有 html 标签

Remove all html tags except allowed tags using XSLT function

xml

xslt

replace

strip-tags