将 xslt 标记化函数应用于应用模板的结果

Apply xslt tokenize function to results of apply-templates

我有一个 XML 的块,格式如下:

<line n="2">
      <orig>of right hool herte <ex>&amp;</ex> in our<ex>e</ex><note place="bottom" anchored="true" xml:id="explanatory">Although “r” on the painted panels of the chapel is consistently written with an otiose mark when it concludes a word, the mark here is rendered more heavily and with a dot indicating suspension above the r. This rendering as “our<ex>e</ex>” is a linguistic outlier for the area based on the electronic <emph rend="italic">Linguistic Atlas of Late Medieval English</emph>’s linguistic profiles for “oure,” “our,” and “our<ex>e</ex>.” See eLALME's <ref target="http://archive.ling.ed.ac.uk/ihd/elalme_scripts/mapping/user-defined_maps.html">User Defined Maps</ref> for more information. Unfortunately the current online version (as of 12 July 2014) does not allow direct linking between static dotmaps and linguistic profiles.</note> best entent</orig>
</line>

我需要能够将其简化为纯文本:"of right hool herte & in oure best entent,",然后对 space 进行标记化以获取逗号或标记分隔值的列表。我通过以下 xslt 完成了纯文本的位:

<xsl:template match="tei:line" >
        <xsl:apply-templates />   
</xsl:template>

<xsl:template match="orig">
    <xsl:apply-templates/>
</xsl:template>

<xsl:template match="ex">
    <xsl:apply-templates/>
</xsl:template>

<xsl:template match="note"/>

但是,我无法让 tokenize 函数与应用模板一起使用。如果我尝试改用 value-of,那么标签下面的标签将不再正常工作。有没有一种方法可以 运行 xml 上的应用模板,然后在单个 xslt 中标记每个元素?谢谢!

您不需要 tokenize() 来获得此输出:

  of right hool herte & in oure best entent

恒等变换和压制模板 note 将为您完成:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="note"/>

</xsl:stylesheet>

如果你想让它以逗号分隔,你可以将上面的文本输出捕获到一个变量中,然后像你提到的那样应用tokenize

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text"/>

  <xsl:variable name="result">
    <xsl:apply-templates/>
  </xsl:variable>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="note"/>

  <xsl:template match="/">
    <xsl:value-of select="string-join(tokenize(normalize-space($result), ' '), ',')"/>
  </xsl:template>

</xsl:stylesheet>

根据您的输入 XML,上述 XSLT 将生成以下文本:

of,right,hool,herte,&,in,oure,best,entent

I need to be able to reduce it to just the plaintext: "of right hool herte & in oure best entent," and then tokenize on the space to get a list of either comma or tag-separated values.

不确定 "tag-separated values" 是什么意思。给定以下测试输入

XML

<root>
    <line n="2">
          <orig>of right hool herte <ex>&amp;</ex> in our<ex>e</ex><note place="bottom" anchored="true" xml:id="explanatory">Although “r” on the painted panels of the chapel is consistently written with an otiose mark when it concludes a word, the mark here is rendered more heavily and with a dot indicating suspension above the r. This rendering as “our<ex>e</ex>” is a linguistic outlier for the area based on the electronic <emph rend="italic">Linguistic Atlas of Late Medieval English</emph>’s linguistic profiles for “oure,” “our,” and “our<ex>e</ex>.” See eLALME's <ref target="http://archive.ling.ed.ac.uk/ihd/elalme_scripts/mapping/user-defined_maps.html">User Defined Maps</ref> for more information. Unfortunately the current online version (as of 12 July 2014) does not allow direct linking between static dotmaps and linguistic profiles.</note> best entent</orig>
    </line>
</root>

以下样式表:

XSLT 2.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:template match="/root">
    <xsl:copy>
        <xsl:apply-templates select="line"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="line">
    <xsl:variable name="line-text">
        <xsl:apply-templates/>
    </xsl:variable>
    <xsl:copy>
        <xsl:copy-of select="@n"/>
        <xsl:value-of select="tokenize(normalize-space($line-text), ' ')" separator=", "/>
    </xsl:copy>
</xsl:template>

<xsl:template match="note"/>

</xsl:stylesheet>

将return:

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <line n="2">of, right, hool, herte, &amp;, in, oure, best, entent</line>
</root>