XML 解析 eCFR - 再次

XML to parse eCFR - again

再次进入突破口。 我希望从 XML 源中导出有序的信息对,用于在数据库中查找 table。 XML 非常扁平,因为它的结构与排版文档的说明相关。在此 XML 中,除了格式外,数据没有区别。 XML的样本如下:

    <APPENDIX>
              <EAR>Pt. 774, Supp. 1</EAR>
              <HD SOURCE="HED">Supplement No. 1 to Part 774—The Commerce Control List</HD>
              <HD SOURCE="HD1">Category 0—Nuclear Materials, Facilities, and Equipment [and Miscellaneous Items]</HD>
              <HD SOURCE="HD1">A. “End Items,” “Equipment,” “Accessories,” “Attachments,” “Parts,” “Components,” and “Systems”</HD>
              <FP SOURCE="FP-2">
                <E T="02">0A002Power generating or propulsion equipment “specially designed” for use with space, marine or mobile “nuclear reactors”. (These items are “subject to the ITAR.” See 22 CFR parts 120 through 130.)</E>
              </FP>

              <FP SOURCE="FP-2">
                <E T="02">0A018Items on the Wassenaar Munitions List (see List of Items Controlled).</E>
              </FP>
              <FP SOURCE="FP-1">
                <E T="04">License Requirements</E>
              </FP>
              <FP SOURCE="FP-1">
                <E T="03">Reason for Control:</E> NS, AT, UN</FP>
              <GPOTABLE CDEF="s50,r50" COLS="2" OPTS="L2">
                <BOXHD>
                  <CHED H="1">Control(s)</CHED>
                  <CHED H="1">Country Chart (See Supp. No. 1 to part 738)</CHED>
                </BOXHD>
                <ROW>
                  <ENT I="01">NS applies to entire entry</ENT>
                  <ENT>NS Column 1.</ENT>
                </ROW>
                <ROW>
                  <ENT I="01">AT applies to entire entry</ENT>
                  <ENT>AT Column 1.</ENT>
                </ROW>
                <ROW>
                  <ENT I="01">UN applies to entire entry</ENT>
                  <ENT>See § 746.1(b) for UN controls.</ENT>
                </ROW>
              </GPOTABLE>
              <FP SOURCE="FP-1">
                <E T="05">List Based License Exceptions (See Part 740 for a description of all license exceptions)</E>
              </FP>
              <FP SOURCE="FP-1">
                <E T="03">LVS:</E> ,000 for 0A018.b</FP>
              <FP SOURCE="FP-1">,500 for 0A018.c and .d</FP>
              <FP SOURCE="FP-1">
                <E T="03">GBS:</E> N/A</FP>
              <FP SOURCE="FP-1">
                <E T="03">CIV:</E> N/A</FP>
              <FP SOURCE="FP-1">
                <E T="04">List of Items Controlled</E>
              </FP>
              <FP SOURCE="FP-1">
                <E T="03">Related Controls:</E> (1) See also 0A979, 0A988, and 22 CFR 121.1 Categories I(a), III(b-d), and X(a). (2) See ECCN 0A617.y.1 and .y.2 for items formerly controlled by ECCN 0A018.a. (3) See ECCN 1A613.c for military helmets providing less than NIJ Type IV protection and ECCN 1A613.y.1 for conventional military steel helmets that, immediately prior to July 1, 2014, were classified under 0A018.d and 0A988. (4) See 22 CFR 121.1 Category X(a)(5) and (a)(6) for controls on other military helmets.</FP>
              <FP SOURCE="FP-1">
                <E T="03">Related Definitions:</E> N/A</FP>
              <FP>
                <E T="03">Items:</E> a. [Reserved]</FP>
              <P>b. “Specially designed” components and parts for ammunition, except cartridge cases, powder bags, bullets, jackets, cores, shells, projectiles, boosters, fuses and components, primers, and other detonating devices and ammunition belting and linking machines (all of which are “subject to the ITAR.” (See 22 CFR parts 120 through 130);</P>
              <NOTE>
                <HD SOURCE="HED">
                  <E T="03">Note:</E>
                </HD>
                <P>
                  <E T="03">0A018.b does not apply to “components” “specially designed” for blank or dummy ammunition as follows:</E>
                </P>
                <P>
                  <E T="03">a. Ammunition crimped without a projectile (blank star);</E>
                </P>
 </APPENDIX>

还附上了两个 XSL 示例。第一个将从节点 FP/E 获取 ECCN 编号,其中属性分别为 "FP-2" 和“02”。第二个使用 xsl:if 语句也从节点 FP 获取 "Reasons for Control"。在后一种情况下,IF 语句用于确定 FP 节点中的 E 节点是否包含 "Reason/s for Control" 文本。

<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
    <xsl:for-each select="//FP[@SOURCE = 'FP-2']/E[@T='02']">
    <xsl:value-of select="."/>\n
    </xsl:for-each>          
</xsl:template>
</xsl:stylesheet

<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
    <xsl:for-each select="//FP[@SOURCE = 'FP-1']">
        <xsl:if test= "E='Reason for Control:' or E='Reasons for Control:'">
        <xsl:value-of select="."/>\n
        </xsl:if>
    </xsl:for-each>
</xsl:template>
</xsl:stylesheet>

我需要的输出是一对有序的前面的ECCN和Reasons for Control信息。我的想法是,如果要将列表向下移动到每个 FP 节点并对它的属性进行测试,保持 XSL 样本上面建议的正确属性,我应该得到一个包含 ECCN 的必要信息的一维列表其次是其匹配的控制原因,如果有的话。但是,我得到了原始 XML 的大部分文本,其中包含大量 "Nothing"。换句话说,我显然匹配了 FP 节点,但 'when' 语句是由于某种原因不满意。

<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:template match="/">
    <xsl:apply-templates/>
    </xsl:template>

    <xsl:template match="FP">
        <xsl:choose>
            <xsl:when test="@Source='FP-2'">
            <xsl:value-of select="."/>\n
            </xsl:when>
        <xsl:when test="@Source='FP-1'">
            <xsl:if test= "E='Reason for Control:' or E='Reasons for Control:'">
                <xsl:value-of select="."/>\n
            </xsl:if>
        </xsl:when>
        <xsl:otherwise>
            Nothing
        </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

</xsl:stylesheet>

我相信,如果我能获得如上所述的一维列表,我以后就能将其输入到 Filemaker 数据库中。鉴于这些前提,任何人都可以就如何进行提供任何建议吗?

这是我从这个非常令人困惑的描述中理解的内容:

  1. 这里有两种类型的节点;可以选择第一个 通过:

    /APPENDIX/FP[@SOURCE='FP-2'][E[@T='02']]
    

    第二个作者:

    /APPENDIX/FP[@SOURCE='FP-1'][E[@T='03']='Reason for Control:']
    

    这些节点是兄弟节点。

  2. 第二种节点与第一种节点相关 第一类型的前兄弟节点;不是第一个的每个节点 类型有第二种类型的相关节点。

基于这些假设,以下样式表:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:key name="k" match="FP[@SOURCE='FP-1'][E[@T='03']='Reason for Control:']" use="generate-id(preceding-sibling::FP[@SOURCE='FP-2'][E[@T='02']][1])" />

<xsl:template match="/APPENDIX">
    <FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
    <METADATA>
        <FIELD NAME="ECCNFP_2" TYPE="TEXT"/> 
        <FIELD NAME="ECCNFP_1" TYPE="TEXT"/> 
    </METADATA>
        <RESULTSET> 
            <xsl:for-each select="FP[@SOURCE='FP-2'][E[@T='02']]">
                <ROW>
                    <COL><DATA><xsl:value-of select="substring(E[@T='02'], 1, 5)"/></DATA></COL> 
                    <COL><DATA><xsl:value-of select="key('k', generate-id())/text()"/></DATA></COL> 
                </ROW> 
            </xsl:for-each>
        </RESULTSET> 
    </FMPXMLRESULT> 
</xsl:template>

</xsl:stylesheet>

当应用于您的输入示例时(在更正未闭合的 <NOTE> 元素之后!),将产生:

结果

<?xml version="1.0" encoding="UTF-8"?>
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
   <METADATA>
      <FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
      <FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
   </METADATA>
   <RESULTSET>
      <ROW>
         <COL>
            <DATA>0A002</DATA>
         </COL>
         <COL>
            <DATA/>
         </COL>
      </ROW>
      <ROW>
         <COL>
            <DATA>0A018</DATA>
         </COL>
         <COL>
            <DATA> NS, AT, UN</DATA>
         </COL>
      </ROW>
   </RESULTSET>
</FMPXMLRESULT>