使用 Apache Nifi 提取 HL7 值并应用正则表达式

Question

我需要使用 Apache Nifi 从 HL7 XML 文档中提取患者信息，并应用正则表达式从包含嵌入式 HTML 的部分中提取诊断结果（是的，抱歉。不是我的设计选择:-( )

HL7 中感兴趣数据的第一个路径是：

"ClinicalDocument"\"recordTarget"\"patientRole"\"patient"\"name",

第二个更复杂的是：

"ClinicalDocument"\"structuredBody"\"component"\"section"\"text @mediaType="text/x-hl7-text+xml "" 其中 title 元素的值等于 "Diagnostic Results"

我需要匹配部分的标题的sub-node文本值=]component 有值“Diagnostic Results”（诊断结果），然后提取对等节点的文本值text.

我的 HL7 XML 片段看起来像：

</ClinicalDocument>
...
        <recordTarget>
            <patientRole>
....
            <patient>
                <name><given>John</given><family>Doe</family></name>
...
<structuredBody>
...
<component>
    <section classCode="DOCSECT" moodCode="EVN">
        <templateId root="0.0.0.0.0.0.1" />
        <code code="000-01" codeSystem="0.0.0.1.0.0"  />
        <title>Diagnostic Results</title>
        <text mediaType="text/x-hl7-text+xml">
            Some data of interest expressed in n microns.<content ID="NKN_results"/>
        </text>

关于如何在 Apache Nifi 中执行此操作的任何建议？

Answer 1

您应该能够使用 XPath 和 NiFi EvaluateXPath 处理器来匹配和提取 <text> 元素。我开始将 structuredBody 标记作为以下表达式的根：

/structuredBody/component/section[title = 'Diagnostic Results' and text[@mediaType='text/x-hl7-text+xml']]/text

但是您应该能够针对完整的 XML 路径调整它。一旦解析出 <text> 元素，从 NiFi 0.5.0 开始，您可以使用 GetHtmlElement 处理器从嵌入的 HTML 中提取。在 NiFi 0.5.0 之前，如果 HTML 格式正确（XHTML，例如），您可以改用另一个 EvaluateXPath 处理器。

使用 Apache Nifi 提取 HL7 值并应用正则表达式

Using Apache Nifi to extract HL7 values and apply regex

regex

hl7-v3

apache-nifi