转换分层 DOM-Tree XML 文件:XSLT 能否在不丢失数据的情况下转换为 "flattened" XML 文件?

Converting Hierarchical DOM-Tree XML Files: Can XSLT convert to a "flattened" XML file without losing data?

我正在处理每月收到的一批 XML 文件。它们都遵循相同的 DOM 树结构,并且没有模式文件。这是一个示例:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<REPORT><REPORT-DTL><REPORT-ID>PCRP60R1-C</REPORT-ID><REPORT-DATE>2020-10-01</REPORT-DATE><REPORT-NAME>OUTSIDE USE REPORT (PATIENTS WITH SIGNED CONSENT)</REPORT-NAME><REPORT-PERIOD-START>2020-09-01</REPORT-PERIOD-START><REPORT-PERIOD-END>2020-09-30</REPORT-PERIOD-END></REPORT-DTL>
  <GROUP><GROUP-DTL><GROUP-ID>DoctorAGroup1234</GROUP-ID><GROUP-TYPE>HOSP</GROUP-TYPE><GROUP-NAME>COUNTY HOSP</GROUP-NAME></GROUP-DTL>
    <PROVIDER><PROVIDER-DTL><PROVIDER-NUMBER>DoctorAID1234</PROVIDER-NUMBER><PROVIDER-LAST-NAME>DoctorALastname</PROVIDER-LAST-NAME><PROVIDER-FIRST-NAME>DoctorAFirstname</PROVIDER-FIRST-NAME><PROVIDER-MIDDLE-NAME>DoctorAMiddleName</PROVIDER-MIDDLE-NAME></PROVIDER-DTL>
      <PATIENT><PATIENT-DTL><PATIENT-HEALTH-NUMBER>PatientANumber1234</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientALastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientAFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1941-02-11</PATIENT-BIRTHDATE><PATIENT-SEX>M</PATIENT-SEX></PATIENT-DTL>
        <SERVICE-DTL1><SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientAServiceDate2020-09-07</SERVICE-DATE><SERVICE-CODE>PatientAServiceCodeABC1</SERVICE-CODE><SERVICE-DESCRIPTION>PatientAServiceDescription-Facelift</SERVICE-DESCRIPTION><SERVICE-AMT>PatientAServiceAmount8.90</SERVICE-AMT></SERVICE-DTL1>
      </PATIENT>
      <PATIENT><PATIENT-DTL><PATIENT-HEALTH-NUMBER>PatientBNumber1235</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientBLastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientBFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1955-10-11</PATIENT-BIRTHDATE><PATIENT-SEX>F</PATIENT-SEX></PATIENT-DTL>
        <SERVICE-DTL1><SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientBServiceDate2020-12-08</SERVICE-DATE><SERVICE-CODE>PatientBServiceCodeABC2</SERVICE-CODE><SERVICE-DESCRIPTION>PatientBServiceDescription-Checkup</SERVICE-DESCRIPTION><SERVICE-AMT>PatientBServiceAmount10.50</SERVICE-AMT></SERVICE-DTL1>
      </PATIENT>
      <PATIENT><PATIENT-DTL><PATIENT-HEALTH-NUMBER>PatientCNumber1236</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientCLastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientCFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1965-02-07</PATIENT-BIRTHDATE><PATIENT-SEX>F</PATIENT-SEX></PATIENT-DTL>
        <SERVICE-DTL1><SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientCServiceDate2020-11-11</SERVICE-DATE><SERVICE-CODE>PatientCServiceCodeABC3</SERVICE-CODE><SERVICE-DESCRIPTION>PatientCServiceDescription-X-Ray</SERVICE-DESCRIPTION><SERVICE-AMT>PatientCServiceAmount18.00</SERVICE-AMT></SERVICE-DTL1>
      </PATIENT>
      <PATIENT><PATIENT-DTL><PATIENT-HEALTH-NUMBER>PatientDNumber1237</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientDLastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientDFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1975-07-09</PATIENT-BIRTHDATE><PATIENT-SEX>M</PATIENT-SEX></PATIENT-DTL>
        <SERVICE-DTL1><SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientDServiceDate2020-01-10</SERVICE-DATE><SERVICE-CODE>PatientDServiceCodeABC4</SERVICE-CODE><SERVICE-DESCRIPTION>PatientDServiceDescription-Nose Cleaning</SERVICE-DESCRIPTION><SERVICE-AMT>PatientDServiceAmount6.00</SERVICE-AMT></SERVICE-DTL1>
      </PATIENT>
    </PROVIDER>
  </GROUP>
</REPORT>

注意代码的层次结构。所有 PATIENT 数据都“属于”父 PROVIDER 节点。每个程序都“属于”取代它的患者。 XML 代码描述 these 4 个相互关联的 tables.

我的数据库软件无法从 XML 文件导入这些相互关联的 table -- 我的数据库应用程序无法遵循代码的层次结构。

相反,如果数据被“扁平化”,并且相互关联的节点被“解包”(和复制),我可以导入数据。 Here 是我希望扁平化的 table 如何看待导入。是的,它有很多冗余——这个新 table 的前 12 列对于每条记录都是相同的。 (没关系,以后可以去掉冗余,此时我只想读入所有数据。)

这是生成上图中扁平化 table 的 XML 代码:

<?xml version="1.0" encoding="ISO-8859-1" ?>

<REPORT>
    <ImportedTable>
<REPORT-ID>PCRP60R1-C</REPORT-ID><REPORT-DATE>2020-10-01</REPORT-DATE><REPORT-NAME>OUTSIDE USE REPORT (PATIENTS WITH SIGNED CONSENT)</REPORT-NAME><REPORT-PERIOD-START>2020-09-01</REPORT-PERIOD-START><REPORT-PERIOD-END>2020-09-30</REPORT-PERIOD-END>
  <GROUP-ID>DoctorAGroup1234</GROUP-ID><GROUP-TYPE>HOSP</GROUP-TYPE><GROUP-NAME>COUNTY HOSP</GROUP-NAME>
<PROVIDER-NUMBER>DoctorAID1234</PROVIDER-NUMBER><PROVIDER-LAST-NAME>DoctorALastname</PROVIDER-LAST-NAME><PROVIDER-FIRST-NAME>DoctorAFirstname</PROVIDER-FIRST-NAME><PROVIDER-MIDDLE-NAME>DoctorAMiddleName</PROVIDER-MIDDLE-NAME>
<PATIENT-HEALTH-NUMBER>PatientANumber1234</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientALastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientAFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1941-02-11</PATIENT-BIRTHDATE><PATIENT-SEX>M</PATIENT-SEX>
        <SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientAServiceDate2020-09-07</SERVICE-DATE><SERVICE-CODE>PatientAServiceCodeABC1</SERVICE-CODE><SERVICE-DESCRIPTION>PatientAServiceDescription-Facelift</SERVICE-DESCRIPTION><SERVICE-AMT>PatientAServiceAmount8.90</SERVICE-AMT>
    </ImportedTable>

    <ImportedTable>
<REPORT-ID>PCRP60R1-C</REPORT-ID><REPORT-DATE>2020-10-01</REPORT-DATE><REPORT-NAME>OUTSIDE USE REPORT (PATIENTS WITH SIGNED CONSENT)</REPORT-NAME><REPORT-PERIOD-START>2020-09-01</REPORT-PERIOD-START><REPORT-PERIOD-END>2020-09-30</REPORT-PERIOD-END>
  <GROUP-ID>DoctorAGroup1234</GROUP-ID><GROUP-TYPE>HOSP</GROUP-TYPE><GROUP-NAME>COUNTY HOSP</GROUP-NAME>
<PROVIDER-NUMBER>DoctorAID1234</PROVIDER-NUMBER><PROVIDER-LAST-NAME>DoctorALastname</PROVIDER-LAST-NAME><PROVIDER-FIRST-NAME>DoctorAFirstname</PROVIDER-FIRST-NAME><PROVIDER-MIDDLE-NAME>DoctorAMiddleName</PROVIDER-MIDDLE-NAME>
<PATIENT-HEALTH-NUMBER>PatientBNumber1235</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientBLastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientBFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1955-10-11</PATIENT-BIRTHDATE><PATIENT-SEX>F</PATIENT-SEX>
        <SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientBServiceDate2020-12-08</SERVICE-DATE><SERVICE-CODE>PatientBServiceCodeABC2</SERVICE-CODE><SERVICE-DESCRIPTION>PatientBServiceDescription-Checkup</SERVICE-DESCRIPTION><SERVICE-AMT>PatientBServiceAmount10.50</SERVICE-AMT>
    </ImportedTable>

    <ImportedTable>
<REPORT-ID>PCRP60R1-C</REPORT-ID><REPORT-DATE>2020-10-01</REPORT-DATE><REPORT-NAME>OUTSIDE USE REPORT (PATIENTS WITH SIGNED CONSENT)</REPORT-NAME><REPORT-PERIOD-START>2020-09-01</REPORT-PERIOD-START><REPORT-PERIOD-END>2020-09-30</REPORT-PERIOD-END>
<GROUP-ID>DoctorAGroup1234</GROUP-ID><GROUP-TYPE>HOSP</GROUP-TYPE><GROUP-NAME>COUNTY HOSP</GROUP-NAME>
    <PROVIDER-NUMBER>DoctorAID1234</PROVIDER-NUMBER><PROVIDER-LAST-NAME>DoctorALastname</PROVIDER-LAST-NAME><PROVIDER-FIRST-NAME>DoctorAFirstname</PROVIDER-FIRST-NAME><PROVIDER-MIDDLE-NAME>DoctorAMiddleName</PROVIDER-MIDDLE-NAME>
      <PATIENT-HEALTH-NUMBER>PatientCNumber1236</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientCLastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientCFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1965-02-07</PATIENT-BIRTHDATE><PATIENT-SEX>F</PATIENT-SEX>
        <SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientCServiceDate2020-11-11</SERVICE-DATE><SERVICE-CODE>PatientCServiceCodeABC3</SERVICE-CODE><SERVICE-DESCRIPTION>PatientCServiceDescription-X-Ray</SERVICE-DESCRIPTION><SERVICE-AMT>PatientCServiceAmount18.00</SERVICE-AMT>
    </ImportedTable>

    <ImportedTable>
<REPORT-ID>PCRP60R1-C</REPORT-ID><REPORT-DATE>2020-10-01</REPORT-DATE><REPORT-NAME>OUTSIDE USE REPORT (PATIENTS WITH SIGNED CONSENT)</REPORT-NAME><REPORT-PERIOD-START>2020-09-01</REPORT-PERIOD-START><REPORT-PERIOD-END>2020-09-30</REPORT-PERIOD-END>
<GROUP-ID>DoctorAGroup1234</GROUP-ID><GROUP-TYPE>HOSP</GROUP-TYPE><GROUP-NAME>COUNTY HOSP</GROUP-NAME>
    <PROVIDER-NUMBER>DoctorAID1234</PROVIDER-NUMBER><PROVIDER-LAST-NAME>DoctorALastname</PROVIDER-LAST-NAME><PROVIDER-FIRST-NAME>DoctorAFirstname</PROVIDER-FIRST-NAME><PROVIDER-MIDDLE-NAME>DoctorAMiddleName</PROVIDER-MIDDLE-NAME>
      <PATIENT-HEALTH-NUMBER>PatientDNumber1237</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientDLastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientDFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1975-07-09</PATIENT-BIRTHDATE><PATIENT-SEX>M</PATIENT-SEX>
     <SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientDServiceDate2020-01-10</SERVICE-DATE><SERVICE-CODE>PatientDServiceCodeABC4</SERVICE-CODE><SERVICE-DESCRIPTION>PatientDServiceDescription-Nose Cleaning</SERVICE-DESCRIPTION><SERVICE-AMT>PatientDServiceAmount6.00</SERVICE-AMT>
    </ImportedTable>
</REPORT>

所以,我的问题是 XSLT 是否可以从顶部 XML 文件转换为底部文件。请注意,在转换后的 XML 文件中,我只对保留包含文本的节点感兴趣。 (在此转换中可以安全地忽略原始文件中的任何非文本节点。)是否有可以进行此转换的代码? (注意:我一直在阅读许多处理 XML 转换的线程,但由于此数据集的关系结构,这种情况并不常见。如果这个问题已在其他地方得到回答,请告诉我!)

非常感谢,

罗恩

扁平化层次结构 XML 是一项常见任务。您只需为最原子的 table 中的每个记录创建一个元素,并包含来自其父 table 的数据。碰巧的是,您的示例 XML 输入的结构使这变得非常简单:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="/REPORT">
    <xsl:copy>
        <xsl:variable name="report-data" select="REPORT-DTL/*" />
        <xsl:for-each select="GROUP">
            <xsl:variable name="group-data" select="GROUP-DTL/*" />
            <xsl:for-each select="PROVIDER">
                <xsl:variable name="provider-data" select="PROVIDER-DTL/*" />
                <xsl:for-each select="PATIENT">
                    <ImportedTable>
                        <xsl:copy-of select="$report-data"/>
                        <xsl:copy-of select="$group-data"/>
                        <xsl:copy-of select="$provider-data"/>
                        <xsl:copy-of select="PATIENT-DTL/*"/>
                        <xsl:copy-of select="SERVICE-DTL1/*"/>
                    </ImportedTable>
                </xsl:for-each>
            </xsl:for-each>
        </xsl:for-each>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>