XSL 从相似的标记条目创建 'chapters' 或 'groups'
XSL creating 'chapters' or 'groups' from similar tagged entries
我有一个大型 XML 语料库文档,其结构大致如下所示:
<corpus>
<document n="001">
<front>
<title>foo title</title>
<group n="foo_group_A"/>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
<seg n="3">some text with markups</seg>
</body>
</document>
<document n=002">
<front>
<title>foo title</title>
<group n="foo_group_A"/>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
</body>
</document>
<document n="003">
<front>
<title>foo title</title>
<group n="foo_group_A"/>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
<seg n="3">some text with markups</seg>
</body>
</document>
<document n="004">
<front>
<title>foo title</title>
<group n="foo_group_B"/>
<front>
<body>
<seg n="1">some text with markups</seg>
</body>
</document>
<document n="005">
<front>
<title>foo title</title>
<group n="foo_group_B"/>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
</body>
</document>
[...]
</corpus>
我正在使用 XSL 3.0 将此 XML 文件预处理为另一种格式 XML
在最终输出为 PDF 之前。作为转换的一部分,我想在新的 <chapter>
元素中收集 'wrap' <document>
,该元素反映了 front/group/@n
的值。新语料库如下所示,其中 group/@n
值提供了在新 chapter
下分组的逻辑:
<corpus>
<chapter n="foo_group_A">
<document n="001">
<front>
<title>foo title</title>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
<seg n="3">some text with markups</seg>
</body>
</document>
<document n=002">
<front>
<title>foo title</title>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
</body>
</document>
<document n="003">
<front>
<title>foo title</title>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
<seg n="3">some text with markups</seg>
</body>
</document>
</chapter>
<chapter n="foo_group_B">
<document n="004">
<front>
<title>foo title</title>
<front>
<body>
<seg n="1">some text with markups</seg>
</body>
</document>
<document n="005">
<front>
<title>foo title</title>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
</body>
</document>
</chapter>
[...]
</corpus>
文件已经预先排序 foo_group_A、foo_group_B 等,因此不需要额外排序。它只需要创建一个新元素 <chapter>
来包含相关文档。我用 xsl:for-each
试过这个,但我想我缺少某种 'summary' 或 'collection' 的组来迭代。
非常感谢。
如果您使用 XSLT 3 并希望对项目进行分组,那么您当然不会使用 xsl:for-each
,而是使用 xsl:for-each-group
,例如
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="corpus">
<xsl:copy>
<xsl:for-each-group select="document" group-by="front/group/@n">
<chapter n="{current-grouping-key()}">
<xsl:apply-templates select="current-group()"/>
</chapter>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
<xsl:template match="front/group"/>
</xsl:stylesheet>
http://xsltfiddle.liberty-development.net/nbUY4ki
如果 document
已经按分组键 front/group/@n
排序,那么使用 xsl:for-each-group select="document" group-adjacent="front/group/@n"
而不是上面的 group-by
也应该足够了,这样就可以了通过将 streamable="yes"
添加到 xsl:mode
声明并使用 xsl:for-each-group select="copy-of(document)" group-adjacent="front/group/@n"
进行分组,可以更轻松地对大型文档使用流式传输。
我有一个大型 XML 语料库文档,其结构大致如下所示:
<corpus>
<document n="001">
<front>
<title>foo title</title>
<group n="foo_group_A"/>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
<seg n="3">some text with markups</seg>
</body>
</document>
<document n=002">
<front>
<title>foo title</title>
<group n="foo_group_A"/>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
</body>
</document>
<document n="003">
<front>
<title>foo title</title>
<group n="foo_group_A"/>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
<seg n="3">some text with markups</seg>
</body>
</document>
<document n="004">
<front>
<title>foo title</title>
<group n="foo_group_B"/>
<front>
<body>
<seg n="1">some text with markups</seg>
</body>
</document>
<document n="005">
<front>
<title>foo title</title>
<group n="foo_group_B"/>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
</body>
</document>
[...]
</corpus>
我正在使用 XSL 3.0 将此 XML 文件预处理为另一种格式 XML
在最终输出为 PDF 之前。作为转换的一部分,我想在新的 <chapter>
元素中收集 'wrap' <document>
,该元素反映了 front/group/@n
的值。新语料库如下所示,其中 group/@n
值提供了在新 chapter
下分组的逻辑:
<corpus>
<chapter n="foo_group_A">
<document n="001">
<front>
<title>foo title</title>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
<seg n="3">some text with markups</seg>
</body>
</document>
<document n=002">
<front>
<title>foo title</title>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
</body>
</document>
<document n="003">
<front>
<title>foo title</title>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
<seg n="3">some text with markups</seg>
</body>
</document>
</chapter>
<chapter n="foo_group_B">
<document n="004">
<front>
<title>foo title</title>
<front>
<body>
<seg n="1">some text with markups</seg>
</body>
</document>
<document n="005">
<front>
<title>foo title</title>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
</body>
</document>
</chapter>
[...]
</corpus>
文件已经预先排序 foo_group_A、foo_group_B 等,因此不需要额外排序。它只需要创建一个新元素 <chapter>
来包含相关文档。我用 xsl:for-each
试过这个,但我想我缺少某种 'summary' 或 'collection' 的组来迭代。
非常感谢。
如果您使用 XSLT 3 并希望对项目进行分组,那么您当然不会使用 xsl:for-each
,而是使用 xsl:for-each-group
,例如
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="corpus">
<xsl:copy>
<xsl:for-each-group select="document" group-by="front/group/@n">
<chapter n="{current-grouping-key()}">
<xsl:apply-templates select="current-group()"/>
</chapter>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
<xsl:template match="front/group"/>
</xsl:stylesheet>
http://xsltfiddle.liberty-development.net/nbUY4ki
如果 document
已经按分组键 front/group/@n
排序,那么使用 xsl:for-each-group select="document" group-adjacent="front/group/@n"
而不是上面的 group-by
也应该足够了,这样就可以了通过将 streamable="yes"
添加到 xsl:mode
声明并使用 xsl:for-each-group select="copy-of(document)" group-adjacent="front/group/@n"
进行分组,可以更轻松地对大型文档使用流式传输。