如何使用 JAVA 将给定的分层 XML 文件转换为非规范化关系数据库 Table

How to convert a given hierarchical XML file into a denormalized Relational Database Table using JAVA

给定以下 XML 结构:

<clinical_study>
  <primary_outcome>
    <measure></measure>
    <time_frame></time_frame>
    <safety_issue></safety_issue>
    <description></description>
  </primary_outcome>
  <secondary_outcome>
    <measure></measure>
    <time_frame></time_frame>
    <safety_issue></safety_issue>
    <description></description>
  </secondary_outcome>
</clinical_study>

我想解析这些属性标记中的值,并将它们转储到名为 "CLINICAL_STUDY" 的 Oracle table 中,具有以下列结构:

desc clinical_study
Name                         Null     Type              
---------------------------- -------- -----------------    
PRIMARY_OUTCOME_MEASURE               VARCHAR2(50)      
PRIMARY_OUTCOME_TIME_FRAME            VARCHAR2(50)      
PRIMARY_OUTCOME_SAFETY_ISSUE          VARCHAR2(50)      
PRIMARY_OUTCOME_DESCRIPTION           VARCHAR2(4000)    
SECONDARY_OUTCOME_MEASURE               VARCHAR2(50)      
SECONDARY_OUTCOME_TIME_FRAME            VARCHAR2(50)      
SECONDARY_OUTCOME_SAFETY_ISSUE          VARCHAR2(50)      
SECONDARY_OUTCOME_DESCRIPTION           VARCHAR2(4000)   

我知道有几种方法可以实现这一点,但不太确定哪种方法最简单。我倾向于将 "flatten" 数据然后轻松转换为 table 结构的 XSLT,但很好奇是否有更简单的方法。提前致谢。

如果您使用 Java 进行编程,则不需要 运行 XSLT 转换。您应该能够从 XML 文件中读取数据并将其写入数据库。

当然有不止一种方法可以做到这一点,但一种方法是使用 JAXB 读入数据。 (如果你有足够小的数据可以在处理时将其保存在内存中,这基本上可以工作。如果你有大量数据,你可能想要使用流式 XML 解析器 API 就像取而代之的是 StAX。)

首先,您可以创建一对 类 来表示您的输入数据,并使用定义 XML.

映射的 JAXB 注释进行注释
@XmlRootElement(name="clinical_study")
@XmlAccessorType(XmlAccessType.FIELD)
public class ClinicalStudy {

    @XmlElement(name="primary_outcome")
    private Outcome primaryOutcome;

    @XmlElement(name="secondary_outcome")
    private Outcome secondaryOutcome;

    // getters and setters omitted for brevity
}

@XmlAccessorType(XmlAccessType.FIELD)
public class Outcome {

    @XmlElement(name="measure")
    private String measure;

    @XmlElement(name="time_frame")
    private String timeFrame;

    @XmlElement(name="safety_issue")
    private String safetyIssue;

    @XmlElement(name="description")
    private String description;

    // getters and setters omitted for brevity
}

然后您可以直接读取或 "unmarshal" 来自 XML 的数据(假设 inputStream 是 XML 内容的流)。

    JAXBContext context = JAXBContext.newInstance(ClinicalStudy.class);
    Unmarshaller unmarshaller = context.createUnmarshaller();
    ClinicalStudy clinicalStudy = (ClinicalStudy) unmarshaller.unmarshal(inputStream);

并插入到您的数据库中(假设 conn 是一个 JDBC 数据库连接)。

    PreparedStatement pstmt = conn.prepareStatement(
            "INSERT INTO clinical_study ("
            + "PRIMARY_OUTCOME_MEASURE, "
            + "PRIMARY_OUTCOME_TIME_FRAME, "
            + "PRIMARY_OUTCOME_SAFETY_ISSUE, "
            + "PRIMARY_OUTCOME_DESCRIPTION, "
            + "SECONDARY_OUTCOME_MEASURE, "
            + "SECONDARY_OUTCOME_TIME_FRAME, "
            + "SECONDARY_OUTCOME_SAFETY_ISSUE, "
            + "SECONDARY_OUTCOME_DESCRIPTION) "
            + "VALUES (?, ?, ?, ?, ?, ?, ?, ?)");
    pstmt.setString(1, clinicalStudy.getPrimaryOutcome().getMeasure());
    pstmt.setString(2, clinicalStudy.getPrimaryOutcome().getTimeFrame());
    pstmt.setString(3, clinicalStudy.getPrimaryOutcome().getSafetyIssue());
    pstmt.setString(4, clinicalStudy.getPrimaryOutcome().getDescription());
    pstmt.setString(5, clinicalStudy.getSecondaryOutcome().getMeasure());
    pstmt.setString(6, clinicalStudy.getSecondaryOutcome().getTimeFrame());
    pstmt.setString(7, clinicalStudy.getSecondaryOutcome().getSafetyIssue());
    pstmt.setString(8, clinicalStudy.getSecondaryOutcome().getDescription());
    pstmt.execute();