XSLT 合并节点
XSLT merge nodes
所以我有一个凌乱的 xhtml 文件,我想将其转换为 xml。这是一个带有很多 'p' 标签的词典,我想把它们整理出来。这是 xhtml 文件:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="2018-06-29T10:12:48Z" name="dcterms.created" />
<meta content="2018-06-29T10:12:48Z" name="dcterms.modified" />
</head>
<body>
<p><b>Aesthetik</b></p>
<p>text about aesthetics.</p>
<p><b>Expl: </b>explanation about aesthetics</p>
<p><b>BegrG: </b>origin of the term</p>
<p>more origin of the term</p>
<p><b>Allegorese</b></p>
<p>text about Allegorese</p>
<p><b>Expl: </b>explanation about Allegorese</p>
<p><b>BegrG: </b>origin of Allegorese</p>
</body>
</html>
XSLT 文件如下所示(其他标签还有几行,此处未包括在内):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.w3.org/1999/xhtml">
<xsl:template match="head"/>
<xsl:template match="text()">
<xsl:value-of select="normalize-space()"/>
</xsl:template>
<xsl:template match="body">
<lexica>
<xsl:apply-templates/> <!-- create root node lexica -->
</lexica>
</xsl:template>
<xsl:template match="p">
<p>
<xsl:apply-templates/> <!-- copy same tags for better visuality -->
</p>
</xsl:template>
<xsl:template match="p[b[contains(., 'BegrG')]]">
<BegrG>
<xsl:apply-templates/> <!-- create specific nodes with origin explanation of the word -->
</BegrG>
</xsl:template>
<xsl:template match="p[b[contains(., 'Expl')]]">
<Expl>
<xsl:apply-templates/> <!-- node with explanation of the word -->
</Expl>
</xsl:template>
<xsl:template
match="
p[b[not(self::*[contains(., 'Expl')]or
self::*[contains(., 'BegrG')])]]"> <!-- any other b nodes which are left are lexical items -->
<Artikel>
<xsl:apply-templates/>
</Artikel>
</xsl:template>
最后,我的 XML 文件如下所示:
<lexica>
<Artikel>Aesthetik</Artikel>
<p>text about aesthetics.</p>
<Expl>Expl:explanation about aesthetics</Expl>
<BegrG>BegrG:origin of the term</BegrG>
<p>more origin of the term</p>
<Artikel>Allegorese</Artikel>
<p>text about Allegorese</p>
<Expl>Expl:explanation about Allegorese</Expl>
<BegrG>BegrG:origin of Allegorese</BegrG>
</lexica>
这看起来更好,但仍然行不通,因为它的结构不够完善。例如,术语没有分组,一些 'p' 标签应该合并到它们之前的兄弟标签中。它应该是这样的:
<lexica>
<item>
<Artikel>Aesthetik</Artikel>
<short>text about aesthetics.</short>
<Expl>Expl:explanation about aesthetics</Expl>
<BegrG>BegrG:origin of the term. more origin of the term.</BegrG>
</item>
<item>
<Artikel>Allegorese</Artikel>
<short>text about Allegorese</short>
<Expl>Expl:explanation about Allegorese</Expl>
<BegrG>BegrG:origin of Allegorese</BegrG>
</item>
</lexica>
我是不是处理错了,或者我应该如何将 'p' 标签分组到具有 b- child 的同级?以及如何将术语项彼此分开并使其识别何时应该出现关闭标记?
(抱歉我的英语不好)
提前致谢!
XSLT 2/3 具有 for-each-group group-starting-with
(https://www.w3.org/TR/xslt20/#xsl-for-each-group),因此您可以使用
创建 item
元素
<xsl:template match="body">
<lexica>
<xsl:for-each-group select="*" group-starting-with="p[b[not(matches(., '^(Expl|BegrG):'))]]">
<item>
<xsl:apply-templates select="current-group()"/>
</item>
</xsl:for-each-group>
</lexica>
</xsl:template>
我想,例子在https://xsltfiddle.liberty-development.net/bFDb2CG。
到目前为止,我不确定是什么决定了将某些 p
元素合并到 BegrG
结果中,也许是
的嵌套分组
<xsl:template match="body">
<lexica>
<xsl:for-each-group select="*" group-starting-with="p[b[not(matches(., '^(Expl|BegrG):'))]]">
<item>
<xsl:for-each-group select="current-group()" group-starting-with="p[b[starts-with(., 'BegrG:')]]">
<xsl:choose>
<xsl:when test="self::p[b[starts-with(., 'BegrG:')]]">
<BegrG>
<xsl:apply-templates select="current-group()/node()"/>
</BegrG>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</item>
</xsl:for-each-group>
</lexica>
</xsl:template>
实施:https://xsltfiddle.liberty-development.net/bFDb2CG/1
关于评论中提出的问题,您可以在 group-starting-with
中添加另一个匹配项:
<xsl:template match="body">
<lexica>
<xsl:for-each-group select="*" group-starting-with="p[b[not(matches(., '^(Expl|BegrG):'))]]">
<item>
<xsl:for-each-group select="current-group()" group-starting-with="p[b[starts-with(., 'Expl:')]] | p[b[starts-with(., 'BegrG:')]]">
<xsl:choose>
<xsl:when test="self::p[b[starts-with(., 'Expl:')]]">
<Expl>
<xsl:apply-templates select="current-group()/node()"/>
</Expl>
</xsl:when>
<xsl:when test="self::p[b[starts-with(., 'BegrG:')]]">
<BegrG>
<xsl:apply-templates select="current-group()/node()"/>
</BegrG>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</item>
</xsl:for-each-group>
</lexica>
</xsl:template>
所以我有一个凌乱的 xhtml 文件,我想将其转换为 xml。这是一个带有很多 'p' 标签的词典,我想把它们整理出来。这是 xhtml 文件:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="2018-06-29T10:12:48Z" name="dcterms.created" />
<meta content="2018-06-29T10:12:48Z" name="dcterms.modified" />
</head>
<body>
<p><b>Aesthetik</b></p>
<p>text about aesthetics.</p>
<p><b>Expl: </b>explanation about aesthetics</p>
<p><b>BegrG: </b>origin of the term</p>
<p>more origin of the term</p>
<p><b>Allegorese</b></p>
<p>text about Allegorese</p>
<p><b>Expl: </b>explanation about Allegorese</p>
<p><b>BegrG: </b>origin of Allegorese</p>
</body>
</html>
XSLT 文件如下所示(其他标签还有几行,此处未包括在内):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.w3.org/1999/xhtml">
<xsl:template match="head"/>
<xsl:template match="text()">
<xsl:value-of select="normalize-space()"/>
</xsl:template>
<xsl:template match="body">
<lexica>
<xsl:apply-templates/> <!-- create root node lexica -->
</lexica>
</xsl:template>
<xsl:template match="p">
<p>
<xsl:apply-templates/> <!-- copy same tags for better visuality -->
</p>
</xsl:template>
<xsl:template match="p[b[contains(., 'BegrG')]]">
<BegrG>
<xsl:apply-templates/> <!-- create specific nodes with origin explanation of the word -->
</BegrG>
</xsl:template>
<xsl:template match="p[b[contains(., 'Expl')]]">
<Expl>
<xsl:apply-templates/> <!-- node with explanation of the word -->
</Expl>
</xsl:template>
<xsl:template
match="
p[b[not(self::*[contains(., 'Expl')]or
self::*[contains(., 'BegrG')])]]"> <!-- any other b nodes which are left are lexical items -->
<Artikel>
<xsl:apply-templates/>
</Artikel>
</xsl:template>
最后,我的 XML 文件如下所示:
<lexica>
<Artikel>Aesthetik</Artikel>
<p>text about aesthetics.</p>
<Expl>Expl:explanation about aesthetics</Expl>
<BegrG>BegrG:origin of the term</BegrG>
<p>more origin of the term</p>
<Artikel>Allegorese</Artikel>
<p>text about Allegorese</p>
<Expl>Expl:explanation about Allegorese</Expl>
<BegrG>BegrG:origin of Allegorese</BegrG>
</lexica>
这看起来更好,但仍然行不通,因为它的结构不够完善。例如,术语没有分组,一些 'p' 标签应该合并到它们之前的兄弟标签中。它应该是这样的:
<lexica>
<item>
<Artikel>Aesthetik</Artikel>
<short>text about aesthetics.</short>
<Expl>Expl:explanation about aesthetics</Expl>
<BegrG>BegrG:origin of the term. more origin of the term.</BegrG>
</item>
<item>
<Artikel>Allegorese</Artikel>
<short>text about Allegorese</short>
<Expl>Expl:explanation about Allegorese</Expl>
<BegrG>BegrG:origin of Allegorese</BegrG>
</item>
</lexica>
我是不是处理错了,或者我应该如何将 'p' 标签分组到具有 b- child 的同级?以及如何将术语项彼此分开并使其识别何时应该出现关闭标记?
(抱歉我的英语不好)
提前致谢!
XSLT 2/3 具有 for-each-group group-starting-with
(https://www.w3.org/TR/xslt20/#xsl-for-each-group),因此您可以使用
item
元素
<xsl:template match="body">
<lexica>
<xsl:for-each-group select="*" group-starting-with="p[b[not(matches(., '^(Expl|BegrG):'))]]">
<item>
<xsl:apply-templates select="current-group()"/>
</item>
</xsl:for-each-group>
</lexica>
</xsl:template>
我想,例子在https://xsltfiddle.liberty-development.net/bFDb2CG。
到目前为止,我不确定是什么决定了将某些 p
元素合并到 BegrG
结果中,也许是
<xsl:template match="body">
<lexica>
<xsl:for-each-group select="*" group-starting-with="p[b[not(matches(., '^(Expl|BegrG):'))]]">
<item>
<xsl:for-each-group select="current-group()" group-starting-with="p[b[starts-with(., 'BegrG:')]]">
<xsl:choose>
<xsl:when test="self::p[b[starts-with(., 'BegrG:')]]">
<BegrG>
<xsl:apply-templates select="current-group()/node()"/>
</BegrG>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</item>
</xsl:for-each-group>
</lexica>
</xsl:template>
实施:https://xsltfiddle.liberty-development.net/bFDb2CG/1
关于评论中提出的问题,您可以在 group-starting-with
中添加另一个匹配项:
<xsl:template match="body">
<lexica>
<xsl:for-each-group select="*" group-starting-with="p[b[not(matches(., '^(Expl|BegrG):'))]]">
<item>
<xsl:for-each-group select="current-group()" group-starting-with="p[b[starts-with(., 'Expl:')]] | p[b[starts-with(., 'BegrG:')]]">
<xsl:choose>
<xsl:when test="self::p[b[starts-with(., 'Expl:')]]">
<Expl>
<xsl:apply-templates select="current-group()/node()"/>
</Expl>
</xsl:when>
<xsl:when test="self::p[b[starts-with(., 'BegrG:')]]">
<BegrG>
<xsl:apply-templates select="current-group()/node()"/>
</BegrG>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</item>
</xsl:for-each-group>
</lexica>
</xsl:template>