XSLT:分析字符串并保留子节点
XSLT: analyze-string and retain child nodes
我正在尝试使用正则表达式文本匹配查找引用其他语句的语句。它适用于文本位于同一节点中的实例,但我正在努力处理作为子节点或跨节点拆分的文本。此外,我想忽略 del 标记内的任何文本。
从这样的文档开始:
<doc>
<sectionA>
<statement id="1">
<title>Titlle A</title>
<statementtext id="a">This is referring to statement 2 about the stuff</statementtext>
<!-- This is referring to statement <ref statementNumber="2">2</ref> about the stuff -->
</statement>
<statement id="2">
<title>Title B</title>
<statementtext id="b">This is <b>my</b> statement <b>1</b> referring to something else</statementtext>
<!-- This is <b>my</b> statement <ref statementNumber="1"><b>1</b></ref> referring to something else -->
</statement>
<statement id="3">
<title>Title 3</title>
<statementtext id="c">This is another statement <b>1</b><i>2</i> about the stuff</statementtext>
<!-- This is another statement <ref statementNumber="12"><b>1</b><i>2</i></ref> about the stuff -->
</statement>
<statement id="4">
<title>Title 4</title>
<statementtext id="d">This is corrected statement <del>1</del><ins>2</ins> about the stuff</statementtext>
<!-- This is corrected statement <ref statementNumber="2"><del>1</del><ins>2</ins></ref> about the stuff -->
</statement>
<statement id="5">
<title>Title 5</title>
<statementtext id="e">This is partially corrected statement 1<del>1</del><ins>5</ins> about the stuff</statementtext>
<!-- This is partially corrected statement <ref statementNumber="15">1<del>1</del><ins>5</ins></ref> about the stuff -->
</statement>
<statement id="6">
<title>Title 6</title>
<statementtext id="f">This is another
<statementtext id="g"> that contains a nested satementtext for statement <b>1</b><i>3</i> about </statementtext>
the stuff</statementtext>
<!-- This is another <statementtext id="g"> that contains a nested satementtext for statement <ref statementNumber="13"><b>1</b><i>3</i></ref> about </statementtext> -->
</statement>
<statement id="7">
<title>Title 7</title>
<statementtext id="h">This is <i>statement</i> <b>1</b> referring to something else</statementtext>
<!-- This is my <i>statement</i> <ref statementNumber="1"><b>1</b></ref> referring to something else -->
</statement>
<statement id="8">
<title>Title 8</title>
<statementtext id="i">This is has no reference to another statement</statementtext>
<!-- his is has no reference to another statement -->
</statement>
</sectionA>
</doc>
使用我当前的模板
<xsl:template match="statementtext">
<statementtext>
<xsl:copy-of select="./@*" />
<xsl:variable name="thisText">
<xsl:value-of select="./descendant-or-self::text()"/>
</xsl:variable>
<xsl:variable name="thisTextFiltered">
<xsl:value-of select="./descendant-or-self::text()[not(descendant-or-self::del and comment())]"/>
</xsl:variable>
<xsl:choose>
<xsl:when test="matches($thisTextFiltered,'(statement\s*)(\d+)','i')">
<xsl:analyze-string select="$thisTextFiltered"
regex="(statement\s*)(\d+)"
flags="ix">
<xsl:matching-substring>
<xsl:value-of select="regex-group(1)"/>
<xsl:variable name="statementNumber">
<xsl:value-of select="regex-group(2)"></xsl:value-of>
</xsl:variable>
<ref>
<xsl:attribute name="statementNumber">
<xsl:value-of select="$statementNumber" />
</xsl:attribute>
<xsl:value-of select="regex-group(2)"/>
</ref>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates />
</xsl:otherwise>
</xsl:choose>
</statementtext>
</xsl:template>
<xsl:template match="@*|*|processing-instruction()|comment()">
<xsl:copy>
<xsl:apply-templates select="*|@*|text()|processing-instruction()|comment()" mode="#current"/>
</xsl:copy>
</xsl:template
这是我的输出:
<!DOCTYPE HTML>
<doc>
<sectionA>
<statement id="1"><title>Titlle A</title><statementtext id="a">This is referring to statement
<ref statementNumber="2">2</ref> about the stuff
</statementtext>
<!-- This is referring to statement <ref statementNumber="2">2</ref> about the stuff -->
</statement>
<statement id="2"><title>Title B</title><statementtext id="b">This is my statement
<ref statementNumber="1">1</ref> referring to something else
</statementtext>
<!-- This is <b>my</b> statement <b><ref statementNumber="1">1</ref></b> referring to something else -->
</statement>
<statement id="3"><title>Title 3</title><statementtext id="c">This is another statement
<ref statementNumber="12">12</ref> about the stuff
</statementtext>
<!-- This is another statement <ref statementNumber="12"><b>1</b><i>2</i></ref> about the stuff -->
</statement>
<statement id="4"><title>Title 4</title><statementtext id="d">This is corrected statement
<ref statementNumber="12">12</ref> about the stuff
</statementtext>
<!-- This is corrected statement <ref statementNumber="2"><del>1</del><ins>2</ins></ref> about the stuff -->
</statement>
<statement id="5"><title>Title 5</title><statementtext id="e">This is partially corrected statement
<ref statementNumber="115">115</ref> about the stuff
</statementtext>
<!-- This is partially corrected statement <ref statementNumber="15">1<del>1</del><ins>5</ins></ref> about the stuff -->
</statement>
<statement id="6"><title>Title 6</title><statementtext id="f">This is another
that contains a nested satementtext for statement
<ref statementNumber="13">13</ref> about
the stuff
</statementtext>
<!-- This is another <statementtext id="g"> that contains a nested satementtext for statement <ref statementNumber="13"><b>1</b><i>3</i></ref> about </statementtext> -->
</statement>
<statement id="7"><title>Title 7</title><statementtext id="h">This is statement
<ref statementNumber="1">1</ref> referring to something else
</statementtext>
<!-- This is my <i>statement</i> <b><ref statementNumber="1">1</ref></b> referring to something else -->
</statement>
<statement id="8"><title>Title 8</title><statementtext id="i">This is has no reference to another statement</statementtext>
<!-- his is has no reference to another statement -->
</statement>
</sectionA>
</doc>
我是关闭还是完全改变我的方法
我尝试使用预处理步骤来包装数字,然后混合使用 group-starting-with/group-adjacent,我认为它现在涵盖了您提供的所有样本,但它相当复杂且嵌套很深的分组代码:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
exclude-result-prefixes="#all"
expand-text="yes"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="text()" mode="analyze">
<xsl:apply-templates select="analyze-string(., 'statement\s*([0-9]+)')" mode="wrap"/>
</xsl:template>
<xsl:mode name="analyze" on-no-match="shallow-copy"/>
<xsl:template match="fn:group[@nr = 1]" mode="wrap">
<n>{.}</n>
</xsl:template>
<xsl:template match="statementtext">
<xsl:copy>
<xsl:variable name="wrapped" as="node()*">
<xsl:apply-templates mode="analyze"/>
</xsl:variable>
<xsl:for-each-group select="$wrapped" group-starting-with="node()[matches(., 'statement\s*$', 'i')]">
<xsl:choose>
<xsl:when test="matches(., 'statement\s*$', 'i')">
<xsl:apply-templates select="."/>
<xsl:for-each-group select="tail(current-group())" group-adjacent="matches(., '^[0-9 ]+$')">
<xsl:choose>
<xsl:when test="current-grouping-key() and position() = 1 and matches(., '^\s+$')">
<xsl:apply-templates select="."/>
<ref statementNumber="{string-join(tail(current-group())[not(self::del)])}">
<xsl:apply-templates select="tail(current-group())"/>
</ref>
</xsl:when>
<xsl:when test="current-grouping-key() and position() = 1">
<ref statementNumber="{string-join(current-group()[not(self::del)])}">
<xsl:apply-templates select="current-group()"/>
</ref>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
<xsl:template match="n">
<xsl:apply-templates/>
</xsl:template>
</xsl:stylesheet>
我不会花时间来生成有效的解决方案,但这里有一些您应该在代码中修复的问题:
<xsl:variable name="statementNumber">
<xsl:value-of select="regex-group(2)"></xsl:value-of>
</xsl:variable>
<ref>
<xsl:attribute name="statementNumber">
<xsl:value-of select="$statementNumber" />
</xsl:attribute>
<xsl:value-of select="regex-group(2)"/>
</ref>
这可以简化为
<ref statementNumber="{regex-group(2)}">{regex-group(2)}</ref>
还有这个:
<xsl:variable name="thisTextFiltered">
<xsl:value-of select="./descendant-or-self::text()[not(descendant-or-self::del and comment())]"/>
</xsl:variable>
不可能是正确的,因为文本节点没有后代(Saxon 应该给你一个警告)。但我不确定你的实际意图。
我正在尝试使用正则表达式文本匹配查找引用其他语句的语句。它适用于文本位于同一节点中的实例,但我正在努力处理作为子节点或跨节点拆分的文本。此外,我想忽略 del 标记内的任何文本。
从这样的文档开始:
<doc>
<sectionA>
<statement id="1">
<title>Titlle A</title>
<statementtext id="a">This is referring to statement 2 about the stuff</statementtext>
<!-- This is referring to statement <ref statementNumber="2">2</ref> about the stuff -->
</statement>
<statement id="2">
<title>Title B</title>
<statementtext id="b">This is <b>my</b> statement <b>1</b> referring to something else</statementtext>
<!-- This is <b>my</b> statement <ref statementNumber="1"><b>1</b></ref> referring to something else -->
</statement>
<statement id="3">
<title>Title 3</title>
<statementtext id="c">This is another statement <b>1</b><i>2</i> about the stuff</statementtext>
<!-- This is another statement <ref statementNumber="12"><b>1</b><i>2</i></ref> about the stuff -->
</statement>
<statement id="4">
<title>Title 4</title>
<statementtext id="d">This is corrected statement <del>1</del><ins>2</ins> about the stuff</statementtext>
<!-- This is corrected statement <ref statementNumber="2"><del>1</del><ins>2</ins></ref> about the stuff -->
</statement>
<statement id="5">
<title>Title 5</title>
<statementtext id="e">This is partially corrected statement 1<del>1</del><ins>5</ins> about the stuff</statementtext>
<!-- This is partially corrected statement <ref statementNumber="15">1<del>1</del><ins>5</ins></ref> about the stuff -->
</statement>
<statement id="6">
<title>Title 6</title>
<statementtext id="f">This is another
<statementtext id="g"> that contains a nested satementtext for statement <b>1</b><i>3</i> about </statementtext>
the stuff</statementtext>
<!-- This is another <statementtext id="g"> that contains a nested satementtext for statement <ref statementNumber="13"><b>1</b><i>3</i></ref> about </statementtext> -->
</statement>
<statement id="7">
<title>Title 7</title>
<statementtext id="h">This is <i>statement</i> <b>1</b> referring to something else</statementtext>
<!-- This is my <i>statement</i> <ref statementNumber="1"><b>1</b></ref> referring to something else -->
</statement>
<statement id="8">
<title>Title 8</title>
<statementtext id="i">This is has no reference to another statement</statementtext>
<!-- his is has no reference to another statement -->
</statement>
</sectionA>
</doc>
使用我当前的模板
<xsl:template match="statementtext">
<statementtext>
<xsl:copy-of select="./@*" />
<xsl:variable name="thisText">
<xsl:value-of select="./descendant-or-self::text()"/>
</xsl:variable>
<xsl:variable name="thisTextFiltered">
<xsl:value-of select="./descendant-or-self::text()[not(descendant-or-self::del and comment())]"/>
</xsl:variable>
<xsl:choose>
<xsl:when test="matches($thisTextFiltered,'(statement\s*)(\d+)','i')">
<xsl:analyze-string select="$thisTextFiltered"
regex="(statement\s*)(\d+)"
flags="ix">
<xsl:matching-substring>
<xsl:value-of select="regex-group(1)"/>
<xsl:variable name="statementNumber">
<xsl:value-of select="regex-group(2)"></xsl:value-of>
</xsl:variable>
<ref>
<xsl:attribute name="statementNumber">
<xsl:value-of select="$statementNumber" />
</xsl:attribute>
<xsl:value-of select="regex-group(2)"/>
</ref>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates />
</xsl:otherwise>
</xsl:choose>
</statementtext>
</xsl:template>
<xsl:template match="@*|*|processing-instruction()|comment()">
<xsl:copy>
<xsl:apply-templates select="*|@*|text()|processing-instruction()|comment()" mode="#current"/>
</xsl:copy>
</xsl:template
这是我的输出:
<!DOCTYPE HTML>
<doc>
<sectionA>
<statement id="1"><title>Titlle A</title><statementtext id="a">This is referring to statement
<ref statementNumber="2">2</ref> about the stuff
</statementtext>
<!-- This is referring to statement <ref statementNumber="2">2</ref> about the stuff -->
</statement>
<statement id="2"><title>Title B</title><statementtext id="b">This is my statement
<ref statementNumber="1">1</ref> referring to something else
</statementtext>
<!-- This is <b>my</b> statement <b><ref statementNumber="1">1</ref></b> referring to something else -->
</statement>
<statement id="3"><title>Title 3</title><statementtext id="c">This is another statement
<ref statementNumber="12">12</ref> about the stuff
</statementtext>
<!-- This is another statement <ref statementNumber="12"><b>1</b><i>2</i></ref> about the stuff -->
</statement>
<statement id="4"><title>Title 4</title><statementtext id="d">This is corrected statement
<ref statementNumber="12">12</ref> about the stuff
</statementtext>
<!-- This is corrected statement <ref statementNumber="2"><del>1</del><ins>2</ins></ref> about the stuff -->
</statement>
<statement id="5"><title>Title 5</title><statementtext id="e">This is partially corrected statement
<ref statementNumber="115">115</ref> about the stuff
</statementtext>
<!-- This is partially corrected statement <ref statementNumber="15">1<del>1</del><ins>5</ins></ref> about the stuff -->
</statement>
<statement id="6"><title>Title 6</title><statementtext id="f">This is another
that contains a nested satementtext for statement
<ref statementNumber="13">13</ref> about
the stuff
</statementtext>
<!-- This is another <statementtext id="g"> that contains a nested satementtext for statement <ref statementNumber="13"><b>1</b><i>3</i></ref> about </statementtext> -->
</statement>
<statement id="7"><title>Title 7</title><statementtext id="h">This is statement
<ref statementNumber="1">1</ref> referring to something else
</statementtext>
<!-- This is my <i>statement</i> <b><ref statementNumber="1">1</ref></b> referring to something else -->
</statement>
<statement id="8"><title>Title 8</title><statementtext id="i">This is has no reference to another statement</statementtext>
<!-- his is has no reference to another statement -->
</statement>
</sectionA>
</doc>
我是关闭还是完全改变我的方法
我尝试使用预处理步骤来包装数字,然后混合使用 group-starting-with/group-adjacent,我认为它现在涵盖了您提供的所有样本,但它相当复杂且嵌套很深的分组代码:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
exclude-result-prefixes="#all"
expand-text="yes"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="text()" mode="analyze">
<xsl:apply-templates select="analyze-string(., 'statement\s*([0-9]+)')" mode="wrap"/>
</xsl:template>
<xsl:mode name="analyze" on-no-match="shallow-copy"/>
<xsl:template match="fn:group[@nr = 1]" mode="wrap">
<n>{.}</n>
</xsl:template>
<xsl:template match="statementtext">
<xsl:copy>
<xsl:variable name="wrapped" as="node()*">
<xsl:apply-templates mode="analyze"/>
</xsl:variable>
<xsl:for-each-group select="$wrapped" group-starting-with="node()[matches(., 'statement\s*$', 'i')]">
<xsl:choose>
<xsl:when test="matches(., 'statement\s*$', 'i')">
<xsl:apply-templates select="."/>
<xsl:for-each-group select="tail(current-group())" group-adjacent="matches(., '^[0-9 ]+$')">
<xsl:choose>
<xsl:when test="current-grouping-key() and position() = 1 and matches(., '^\s+$')">
<xsl:apply-templates select="."/>
<ref statementNumber="{string-join(tail(current-group())[not(self::del)])}">
<xsl:apply-templates select="tail(current-group())"/>
</ref>
</xsl:when>
<xsl:when test="current-grouping-key() and position() = 1">
<ref statementNumber="{string-join(current-group()[not(self::del)])}">
<xsl:apply-templates select="current-group()"/>
</ref>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
<xsl:template match="n">
<xsl:apply-templates/>
</xsl:template>
</xsl:stylesheet>
我不会花时间来生成有效的解决方案,但这里有一些您应该在代码中修复的问题:
<xsl:variable name="statementNumber">
<xsl:value-of select="regex-group(2)"></xsl:value-of>
</xsl:variable>
<ref>
<xsl:attribute name="statementNumber">
<xsl:value-of select="$statementNumber" />
</xsl:attribute>
<xsl:value-of select="regex-group(2)"/>
</ref>
这可以简化为
<ref statementNumber="{regex-group(2)}">{regex-group(2)}</ref>
还有这个:
<xsl:variable name="thisTextFiltered">
<xsl:value-of select="./descendant-or-self::text()[not(descendant-or-self::del and comment())]"/>
</xsl:variable>
不可能是正确的,因为文本节点没有后代(Saxon 应该给你一个警告)。但我不确定你的实际意图。