我应该如何加载其中包含注释和空格的 XML 文件,然后在根元素上使用 XMLGET,我无法获取子元素

How should I load XML file which has comments and spaces in them and then using XMLGET on the root element, I'm not able to get the child elements

(代表 Snowflake 用户提交)


使用:

<clinical_study>
 <!-- This xml conforms to an XML Schema at:
  https://clinicaltrials.gov/ct2/html/images/info/public.xsd -->
 <required_header>
  <download_date>ClinicalTrials.gov processed this data on September 13, 2019</download_date>
  <link_text>Link to the current ClinicalTrials.gov record.</link_text>
  <url>https://clinicaltrials.gov/show/NCT00010010</url>
 </required_header>
 <id_info>
  <org_study_id>CDR0000068431</org_study_id>
  <secondary_id>NYU-0004</secondary_id>
  <secondary_id>P-UPJOHN-NYU-0004</secondary_id>
  <secondary_id>NCI-G00-1906</seco

我得到的是空值而不是根元素内容。我已经从 Snowflake 的文档中阅读了“How to Easily Load and Query XML Data with Snowflake Part 2”,并且正在使用:

SELECT XMLGET(src_xml, 'clinical_study'):"$",
*
FROM STG_XML
;

...但它给了我 NULL,因为我正在尝试使用上面的 SQL.

获取根元素的内容

任何想法、建议、and/or 解决方法?

正如 Mike Walton 所说,XML 是不完整的(这会阻止其他人轻易地复制 OP 询问的 NULL)。如果我们关闭打开的 XML 元素,XMLGET 中 NULL 的问题是 "clinical_study" 是 根节点 ... XMLGET 检索根节点 中的元素。为了return根节点的内容,可以使用表达式:

src_xml:"$" AS clinical_study_contents

这是一个简单的测试工具来演示这一点,以及 XMLGET 的有效使用(提取 "id_info" 元素的内容):

WITH STG_XML AS (
  SELECT PARSE_XML() AS src_xml
    FROM VALUES
           ($$
<clinical_study>
 <!-- This xml conforms to an XML Schema at:
  https://clinicaltrials.gov/ct2/html/images/info/public.xsd -->
 <required_header>
  <download_date>ClinicalTrials.gov processed this data on September 13, 2019</download_date>
  <link_text>Link to the current ClinicalTrials.gov record.</link_text>
  <url>https://clinicaltrials.gov/show/NCT00010010</url>
 </required_header>
 <id_info>
  <org_study_id>CDR0000068431</org_study_id>
  <secondary_id>NYU-0004</secondary_id>
  <secondary_id>P-UPJOHN-NYU-0004</secondary_id>
  <secondary_id>NCI-G00-1906</secondary_id>
 </id_info>
</clinical_study>
$$)
)
SELECT src_xml:"$" AS clinical_study_contents
      ,XMLGET(src_xml, 'id_info') as id_info_element
      ,*
  FROM STG_XML
;
Here is the Good Blog :

https://community.snowflake.com/s/article/Querying-Nested-XML-in-Snowflake

Also , PFB  way to query nested XML elements.

    Sample XML :

    <?xml version="1.0"?>
    <comtec version="2008">
        <customer_transport_order>
            <id>2880ORO</id>
            <order_number>99833104701</order_number>
            <priority>0</priority>
            <order_date>2019-03-22</order_date>
            <order_kind>
                <code>VMI</code>
                <name>VMI</name>
            </order_kind>
            <operational>true</operational>
            <order_status>
                <code>cancel</code>
                <name>cancel</name>
                <status_kind>cancel</status_kind>
            </order_status>
            <contact>
                <id>CEN143096</id>
                <code>CEN127431</code>
                <name>SOUTHERN UNITED ENTERPRISES</name>
            </contact>
        </customer_transport_order>
    </comtec>

    Sample Query:


        select
               XMLGET( cust.value, 'order_number' ):"$"::integer as cust_order,
               XMLGET( cust.value, 'order_date' ):"$"::string as cust_date,
               XMLGET( orderkind.value, 'code' ):"$"::string as order_kind,
               XMLGET( contactval.value, 'id' ):"$"::string as contactval,
               XMLGET( contactval.value, 'code' ):"$"::string as contactcode,
               XMLGET( contactval.value, 'name' ):"$"::string as contactname
        from
            dept_emp_addr
            ,  lateral FLATTEN(dept_emp_addr.xmldata:"$") cust
            , lateral FLATTEN(cust.value:"$") orderkind
            , lateral FLATTEN(cust.value:"$") contactval
          where cust.value like '<customer_transport_order>%' AND  orderkind.value like '<order_kind>%'
          AND contactval.value like '<contact>%'
          ORDER BY cust_order;


  [1]: https://community.snowflake.com/s/article/Querying-Nested-XML-in-Snowflake