在 XSLT 2 中对没有 parent 元素的多个模板匹配进行分组

Group a number of template matches without a parent element in XSLT 2

我有一个非常非结构化的 XML 文档(取自 Pandoc 将 docx 转换为 docbook 格式),我正在尝试使用 XSLT 对其进行清理。 xml的格式是这样的;

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
                  "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
<article>
  <articleinfo>
    <title></title>
  </articleinfo>
<informaltable>
  <tgroup cols="2">
    <colspec align="left" />
    <colspec align="left" />
    <thead>
      <row>
        <entry>
          <emphasis role="strong">How did you assist
          Customer?</emphasis>
        </entry>
        <entry>
          <emphasis>Lorem ipsum dolor sit amet.</emphasis>
        </entry>
      </row>
    </thead>
    <tbody>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
          <emphasis role="strong">What difference did this make for the
          Customer?</emphasis>
        </entry>
        <entry>
          <emphasis>Lorem ipsum dolor sit amet.</emphasis>
        </entry>
      </row>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
    </tbody>
  </tgroup>
</informaltable>
<para>
  Staff Member: John Smith
</para>
<informaltable>
  <tgroup cols="2">
    <colspec align="left" />
    <colspec align="left" />
    <thead>
      <row>
        <entry>
          <emphasis role="strong">How did you assist
          Customer?</emphasis>
        </entry>
        <entry>
          <emphasis>Lorem ipsum dolor sit amet.</emphasis>
        </entry>
      </row>
    </thead>
    <tbody>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
          <emphasis role="strong">What difference did this make for the
          Customer?</emphasis>
        </entry>
        <entry>
          <emphasis>Lorem ipsum dolor sit amet.</emphasis>
        </entry>
      </row>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
    </tbody>
  </tgroup>
</informaltable>
<para>
  Staff Member: John Smith
</para>
<informaltable>
  <tgroup cols="2">
    <colspec align="left" />
    <colspec align="left" />
    <thead>
      <row>
        <entry>
          <emphasis role="strong">How did you assist
          Customer?</emphasis>
        </entry>
        <entry>
        </entry>
      </row>
    </thead>
    <tbody>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
          <emphasis role="strong">What difference did this make for the
          Customer?</emphasis>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
      <row>
        <entry>
        </entry>
        <entry>
        </entry>
      </row>
    </tbody>
  </tgroup>
</informaltable>
<para>
  Staff Member: _________________________
</para>
</article>

我已经使用以下 XSLT 成功地减少了它;

<?xml version="1.0"?>

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output method="xml" indent="yes"/>

    <xsl:variable name="fileDateStamp">
        <xsl:analyze-string select="base-uri(.)" regex="\s*(\d\d\d\d\-\d\d\-\d\d)\s*">
            <xsl:matching-substring>
                <xsl:value-of select="regex-group(1)"/>
            </xsl:matching-substring>
        </xsl:analyze-string>       
    </xsl:variable>

    <xsl:template match="/">
        <impactStatements>
            <xsl:apply-templates/>
        </impactStatements>
    </xsl:template>

    <xsl:template match="informaltable/tgroup/thead/row/entry">
        <xsl:analyze-string select="normalize-space(.)" regex="\s*How(.*)\s*">
            <xsl:matching-substring>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <Assisted>
                    <xsl:value-of select="(.)"/>    
                </Assisted>
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>

    <xsl:template match="informaltable/tgroup/tbody/row/entry">
        <xsl:analyze-string select="normalize-space(.)" regex="\s*What(.*)\s*">
            <xsl:matching-substring>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <Difference>
                    <xsl:value-of select="(.)"/>
                </Difference>
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>

    <xsl:template match="para">
        <xsl:analyze-string select="normalize-space(.)" regex="\s*\Staff Member: ([A-Z].*)\s*">
            <xsl:matching-substring>
                <Staff><xsl:value-of select="regex-group(1)"/></Staff>
                <DateCreated><xsl:value-of select="$fileDateStamp"/></DateCreated>
            </xsl:matching-substring>
        </xsl:analyze-string>
    </xsl:template>

</xsl:stylesheet> 

但我缺少的是能够在每个 'record' 周围添加一个标签。由于 <informaltable><para> 都是 <article> 的 children,我最基本的 XSLT 知识完全让我失望了。我得到

<?xml version="1.0" encoding="UTF-8"?>
<impactStatements>
   <Assisted>Lorem ipsum dolor sit amet.</Assisted>
   <Difference>Lorem ipsum dolor sit amet.</Difference>
   <Staff>John Smith</Staff>
   <DateCreated>2014-01-01</DateCreated>
   <Assisted>Lorem ipsum dolor sit amet.</Assisted>
   <Difference>Lorem ipsum dolor sit amet.</Difference>
   <Staff>John Smith</Staff>
   <DateCreated>2014-01-01</DateCreated>
</impactStatements>

但是我想要;

<?xml version="1.0" encoding="UTF-8"?>
<impactStatements>
    <statement>
        <Assisted>Lorem ipsum dolor sit amet.</Assisted>
        <Difference>Lorem ipsum dolor sit amet.</Difference>
        <Staff>John Smith</Staff>
        <DateCreated>2014-01-01</DateCreated>
    </statement>
    <statement>
        <Assisted>Lorem ipsum dolor sit amet.</Assisted>
        <Difference>Lorem ipsum dolor sit amet.</Difference>
        <Staff>John Smith</Staff>
        <DateCreated>2014-01-01</DateCreated>
    </statement>
</impactStatements>

这是一次性的工作,我知道我可以通过其他方式更改 XML 但我确信我只是缺乏一些基本知识来更改 XSLT 我必须做我想做的事它到。我尝试了各种不同的方法并用谷歌搜索但无济于事。我尝试过的一切都破坏了我生成的 XML.

的格式

一个有趣且问得很好的问题!将匹配 / 的模板更改为

<xsl:template match="/article">
    <impactStatements>
    <xsl:for-each select="informaltable">
        <statement>
            <xsl:apply-templates select=". | following-sibling::*[self::para][1]"/>
        </statement>
    </xsl:for-each>
    </impactStatements>
</xsl:template>

结果是:

<?xml version="1.0" encoding="UTF-8"?>
<impactStatements>
   <statement>
      <Assisted>Lorem ipsum dolor sit amet.</Assisted>
      <Difference>Lorem ipsum dolor sit amet.</Difference>
      <Staff>John Smith</Staff>
      <DateCreated/>
   </statement>
   <statement>
      <Assisted>Lorem ipsum dolor sit amet.</Assisted>
      <Difference>Lorem ipsum dolor sit amet.</Difference>
      <Staff>John Smith</Staff>
      <DateCreated/>
   </statement>
   <statement/>
</impactStatements>

我认为这几乎是正确的。最后有一个空的statement,因为输入中有3个informaltable元素。你想怎么处理?

我将从添加模板开始

<xsl:template match="article">
  <xsl:for-each-group select="*" group-starting-with="informaltable">
    <statement>
      <xsl:apply-templates select="current-group()"/>
    </statement>
  </xsl:for-each-group>
</xsl:template>

对于您的示例(并且在添加 <xsl:strip-space elements="*"/> 以提高可读性之后)我得到了输出

<impactStatements>
   <statement/>
   <statement>
      <Assisted>Lorem ipsum dolor sit amet.</Assisted>
      <Difference>Lorem ipsum dolor sit amet.</Difference>
      <Staff>John Smith</Staff>
      <DateCreated/>
   </statement>
   <statement>
      <Assisted>Lorem ipsum dolor sit amet.</Assisted>
      <Difference>Lorem ipsum dolor sit amet.</Difference>
      <Staff>John Smith</Staff>
      <DateCreated/>
   </statement>
   <statement/>
</impactStatements>

我不确定空 statement 元素是否是由于缺少示例数据造成的,或者您是否希望将某些元素排除在处理之外,您需要解释输入中的哪些元素应该创建一个结果 statement