在 XSLT 2 中对没有 parent 元素的多个模板匹配进行分组
Group a number of template matches without a parent element in XSLT 2
我有一个非常非结构化的 XML 文档(取自 Pandoc 将 docx 转换为 docbook 格式),我正在尝试使用 XSLT 对其进行清理。 xml的格式是这样的;
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
<article>
<articleinfo>
<title></title>
</articleinfo>
<informaltable>
<tgroup cols="2">
<colspec align="left" />
<colspec align="left" />
<thead>
<row>
<entry>
<emphasis role="strong">How did you assist
Customer?</emphasis>
</entry>
<entry>
<emphasis>Lorem ipsum dolor sit amet.</emphasis>
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
<emphasis role="strong">What difference did this make for the
Customer?</emphasis>
</entry>
<entry>
<emphasis>Lorem ipsum dolor sit amet.</emphasis>
</entry>
</row>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>
Staff Member: John Smith
</para>
<informaltable>
<tgroup cols="2">
<colspec align="left" />
<colspec align="left" />
<thead>
<row>
<entry>
<emphasis role="strong">How did you assist
Customer?</emphasis>
</entry>
<entry>
<emphasis>Lorem ipsum dolor sit amet.</emphasis>
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
<emphasis role="strong">What difference did this make for the
Customer?</emphasis>
</entry>
<entry>
<emphasis>Lorem ipsum dolor sit amet.</emphasis>
</entry>
</row>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>
Staff Member: John Smith
</para>
<informaltable>
<tgroup cols="2">
<colspec align="left" />
<colspec align="left" />
<thead>
<row>
<entry>
<emphasis role="strong">How did you assist
Customer?</emphasis>
</entry>
<entry>
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
<emphasis role="strong">What difference did this make for the
Customer?</emphasis>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>
Staff Member: _________________________
</para>
</article>
我已经使用以下 XSLT 成功地减少了它;
<?xml version="1.0"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:variable name="fileDateStamp">
<xsl:analyze-string select="base-uri(.)" regex="\s*(\d\d\d\d\-\d\d\-\d\d)\s*">
<xsl:matching-substring>
<xsl:value-of select="regex-group(1)"/>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:variable>
<xsl:template match="/">
<impactStatements>
<xsl:apply-templates/>
</impactStatements>
</xsl:template>
<xsl:template match="informaltable/tgroup/thead/row/entry">
<xsl:analyze-string select="normalize-space(.)" regex="\s*How(.*)\s*">
<xsl:matching-substring>
</xsl:matching-substring>
<xsl:non-matching-substring>
<Assisted>
<xsl:value-of select="(.)"/>
</Assisted>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template match="informaltable/tgroup/tbody/row/entry">
<xsl:analyze-string select="normalize-space(.)" regex="\s*What(.*)\s*">
<xsl:matching-substring>
</xsl:matching-substring>
<xsl:non-matching-substring>
<Difference>
<xsl:value-of select="(.)"/>
</Difference>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template match="para">
<xsl:analyze-string select="normalize-space(.)" regex="\s*\Staff Member: ([A-Z].*)\s*">
<xsl:matching-substring>
<Staff><xsl:value-of select="regex-group(1)"/></Staff>
<DateCreated><xsl:value-of select="$fileDateStamp"/></DateCreated>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
但我缺少的是能够在每个 'record' 周围添加一个标签。由于 <informaltable>
和 <para>
都是 <article>
的 children,我最基本的 XSLT 知识完全让我失望了。我得到
<?xml version="1.0" encoding="UTF-8"?>
<impactStatements>
<Assisted>Lorem ipsum dolor sit amet.</Assisted>
<Difference>Lorem ipsum dolor sit amet.</Difference>
<Staff>John Smith</Staff>
<DateCreated>2014-01-01</DateCreated>
<Assisted>Lorem ipsum dolor sit amet.</Assisted>
<Difference>Lorem ipsum dolor sit amet.</Difference>
<Staff>John Smith</Staff>
<DateCreated>2014-01-01</DateCreated>
</impactStatements>
但是我想要;
<?xml version="1.0" encoding="UTF-8"?>
<impactStatements>
<statement>
<Assisted>Lorem ipsum dolor sit amet.</Assisted>
<Difference>Lorem ipsum dolor sit amet.</Difference>
<Staff>John Smith</Staff>
<DateCreated>2014-01-01</DateCreated>
</statement>
<statement>
<Assisted>Lorem ipsum dolor sit amet.</Assisted>
<Difference>Lorem ipsum dolor sit amet.</Difference>
<Staff>John Smith</Staff>
<DateCreated>2014-01-01</DateCreated>
</statement>
</impactStatements>
这是一次性的工作,我知道我可以通过其他方式更改 XML 但我确信我只是缺乏一些基本知识来更改 XSLT 我必须做我想做的事它到。我尝试了各种不同的方法并用谷歌搜索但无济于事。我尝试过的一切都破坏了我生成的 XML.
的格式
一个有趣且问得很好的问题!将匹配 /
的模板更改为
<xsl:template match="/article">
<impactStatements>
<xsl:for-each select="informaltable">
<statement>
<xsl:apply-templates select=". | following-sibling::*[self::para][1]"/>
</statement>
</xsl:for-each>
</impactStatements>
</xsl:template>
结果是:
<?xml version="1.0" encoding="UTF-8"?>
<impactStatements>
<statement>
<Assisted>Lorem ipsum dolor sit amet.</Assisted>
<Difference>Lorem ipsum dolor sit amet.</Difference>
<Staff>John Smith</Staff>
<DateCreated/>
</statement>
<statement>
<Assisted>Lorem ipsum dolor sit amet.</Assisted>
<Difference>Lorem ipsum dolor sit amet.</Difference>
<Staff>John Smith</Staff>
<DateCreated/>
</statement>
<statement/>
</impactStatements>
我认为这几乎是正确的。最后有一个空的statement
,因为输入中有3个informaltable
元素。你想怎么处理?
我将从添加模板开始
<xsl:template match="article">
<xsl:for-each-group select="*" group-starting-with="informaltable">
<statement>
<xsl:apply-templates select="current-group()"/>
</statement>
</xsl:for-each-group>
</xsl:template>
对于您的示例(并且在添加 <xsl:strip-space elements="*"/>
以提高可读性之后)我得到了输出
<impactStatements>
<statement/>
<statement>
<Assisted>Lorem ipsum dolor sit amet.</Assisted>
<Difference>Lorem ipsum dolor sit amet.</Difference>
<Staff>John Smith</Staff>
<DateCreated/>
</statement>
<statement>
<Assisted>Lorem ipsum dolor sit amet.</Assisted>
<Difference>Lorem ipsum dolor sit amet.</Difference>
<Staff>John Smith</Staff>
<DateCreated/>
</statement>
<statement/>
</impactStatements>
我不确定空 statement
元素是否是由于缺少示例数据造成的,或者您是否希望将某些元素排除在处理之外,您需要解释输入中的哪些元素应该创建一个结果 statement
。
我有一个非常非结构化的 XML 文档(取自 Pandoc 将 docx 转换为 docbook 格式),我正在尝试使用 XSLT 对其进行清理。 xml的格式是这样的;
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
<article>
<articleinfo>
<title></title>
</articleinfo>
<informaltable>
<tgroup cols="2">
<colspec align="left" />
<colspec align="left" />
<thead>
<row>
<entry>
<emphasis role="strong">How did you assist
Customer?</emphasis>
</entry>
<entry>
<emphasis>Lorem ipsum dolor sit amet.</emphasis>
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
<emphasis role="strong">What difference did this make for the
Customer?</emphasis>
</entry>
<entry>
<emphasis>Lorem ipsum dolor sit amet.</emphasis>
</entry>
</row>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>
Staff Member: John Smith
</para>
<informaltable>
<tgroup cols="2">
<colspec align="left" />
<colspec align="left" />
<thead>
<row>
<entry>
<emphasis role="strong">How did you assist
Customer?</emphasis>
</entry>
<entry>
<emphasis>Lorem ipsum dolor sit amet.</emphasis>
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
<emphasis role="strong">What difference did this make for the
Customer?</emphasis>
</entry>
<entry>
<emphasis>Lorem ipsum dolor sit amet.</emphasis>
</entry>
</row>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>
Staff Member: John Smith
</para>
<informaltable>
<tgroup cols="2">
<colspec align="left" />
<colspec align="left" />
<thead>
<row>
<entry>
<emphasis role="strong">How did you assist
Customer?</emphasis>
</entry>
<entry>
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
<emphasis role="strong">What difference did this make for the
Customer?</emphasis>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
<row>
<entry>
</entry>
<entry>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>
Staff Member: _________________________
</para>
</article>
我已经使用以下 XSLT 成功地减少了它;
<?xml version="1.0"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:variable name="fileDateStamp">
<xsl:analyze-string select="base-uri(.)" regex="\s*(\d\d\d\d\-\d\d\-\d\d)\s*">
<xsl:matching-substring>
<xsl:value-of select="regex-group(1)"/>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:variable>
<xsl:template match="/">
<impactStatements>
<xsl:apply-templates/>
</impactStatements>
</xsl:template>
<xsl:template match="informaltable/tgroup/thead/row/entry">
<xsl:analyze-string select="normalize-space(.)" regex="\s*How(.*)\s*">
<xsl:matching-substring>
</xsl:matching-substring>
<xsl:non-matching-substring>
<Assisted>
<xsl:value-of select="(.)"/>
</Assisted>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template match="informaltable/tgroup/tbody/row/entry">
<xsl:analyze-string select="normalize-space(.)" regex="\s*What(.*)\s*">
<xsl:matching-substring>
</xsl:matching-substring>
<xsl:non-matching-substring>
<Difference>
<xsl:value-of select="(.)"/>
</Difference>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template match="para">
<xsl:analyze-string select="normalize-space(.)" regex="\s*\Staff Member: ([A-Z].*)\s*">
<xsl:matching-substring>
<Staff><xsl:value-of select="regex-group(1)"/></Staff>
<DateCreated><xsl:value-of select="$fileDateStamp"/></DateCreated>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
但我缺少的是能够在每个 'record' 周围添加一个标签。由于 <informaltable>
和 <para>
都是 <article>
的 children,我最基本的 XSLT 知识完全让我失望了。我得到
<?xml version="1.0" encoding="UTF-8"?>
<impactStatements>
<Assisted>Lorem ipsum dolor sit amet.</Assisted>
<Difference>Lorem ipsum dolor sit amet.</Difference>
<Staff>John Smith</Staff>
<DateCreated>2014-01-01</DateCreated>
<Assisted>Lorem ipsum dolor sit amet.</Assisted>
<Difference>Lorem ipsum dolor sit amet.</Difference>
<Staff>John Smith</Staff>
<DateCreated>2014-01-01</DateCreated>
</impactStatements>
但是我想要;
<?xml version="1.0" encoding="UTF-8"?>
<impactStatements>
<statement>
<Assisted>Lorem ipsum dolor sit amet.</Assisted>
<Difference>Lorem ipsum dolor sit amet.</Difference>
<Staff>John Smith</Staff>
<DateCreated>2014-01-01</DateCreated>
</statement>
<statement>
<Assisted>Lorem ipsum dolor sit amet.</Assisted>
<Difference>Lorem ipsum dolor sit amet.</Difference>
<Staff>John Smith</Staff>
<DateCreated>2014-01-01</DateCreated>
</statement>
</impactStatements>
这是一次性的工作,我知道我可以通过其他方式更改 XML 但我确信我只是缺乏一些基本知识来更改 XSLT 我必须做我想做的事它到。我尝试了各种不同的方法并用谷歌搜索但无济于事。我尝试过的一切都破坏了我生成的 XML.
的格式一个有趣且问得很好的问题!将匹配 /
的模板更改为
<xsl:template match="/article">
<impactStatements>
<xsl:for-each select="informaltable">
<statement>
<xsl:apply-templates select=". | following-sibling::*[self::para][1]"/>
</statement>
</xsl:for-each>
</impactStatements>
</xsl:template>
结果是:
<?xml version="1.0" encoding="UTF-8"?>
<impactStatements>
<statement>
<Assisted>Lorem ipsum dolor sit amet.</Assisted>
<Difference>Lorem ipsum dolor sit amet.</Difference>
<Staff>John Smith</Staff>
<DateCreated/>
</statement>
<statement>
<Assisted>Lorem ipsum dolor sit amet.</Assisted>
<Difference>Lorem ipsum dolor sit amet.</Difference>
<Staff>John Smith</Staff>
<DateCreated/>
</statement>
<statement/>
</impactStatements>
我认为这几乎是正确的。最后有一个空的statement
,因为输入中有3个informaltable
元素。你想怎么处理?
我将从添加模板开始
<xsl:template match="article">
<xsl:for-each-group select="*" group-starting-with="informaltable">
<statement>
<xsl:apply-templates select="current-group()"/>
</statement>
</xsl:for-each-group>
</xsl:template>
对于您的示例(并且在添加 <xsl:strip-space elements="*"/>
以提高可读性之后)我得到了输出
<impactStatements>
<statement/>
<statement>
<Assisted>Lorem ipsum dolor sit amet.</Assisted>
<Difference>Lorem ipsum dolor sit amet.</Difference>
<Staff>John Smith</Staff>
<DateCreated/>
</statement>
<statement>
<Assisted>Lorem ipsum dolor sit amet.</Assisted>
<Difference>Lorem ipsum dolor sit amet.</Difference>
<Staff>John Smith</Staff>
<DateCreated/>
</statement>
<statement/>
</impactStatements>
我不确定空 statement
元素是否是由于缺少示例数据造成的,或者您是否希望将某些元素排除在处理之外,您需要解释输入中的哪些元素应该创建一个结果 statement
。