我应该如何加载其中包含注释和空格的 XML 文件,然后在根元素上使用 XMLGET,我无法获取子元素
How should I load XML file which has comments and spaces in them and then using XMLGET on the root element, I'm not able to get the child elements
(代表 Snowflake 用户提交)
使用:
<clinical_study>
<!-- This xml conforms to an XML Schema at:
https://clinicaltrials.gov/ct2/html/images/info/public.xsd -->
<required_header>
<download_date>ClinicalTrials.gov processed this data on September 13, 2019</download_date>
<link_text>Link to the current ClinicalTrials.gov record.</link_text>
<url>https://clinicaltrials.gov/show/NCT00010010</url>
</required_header>
<id_info>
<org_study_id>CDR0000068431</org_study_id>
<secondary_id>NYU-0004</secondary_id>
<secondary_id>P-UPJOHN-NYU-0004</secondary_id>
<secondary_id>NCI-G00-1906</seco
我得到的是空值而不是根元素内容。我已经从 Snowflake 的文档中阅读了“How to Easily Load and Query XML Data with Snowflake Part 2”,并且正在使用:
SELECT XMLGET(src_xml, 'clinical_study'):"$",
*
FROM STG_XML
;
...但它给了我 NULL,因为我正在尝试使用上面的 SQL.
获取根元素的内容
任何想法、建议、and/or 解决方法?
正如 Mike Walton 所说,XML 是不完整的(这会阻止其他人轻易地复制 OP 询问的 NULL)。如果我们关闭打开的 XML 元素,XMLGET 中 NULL 的问题是 "clinical_study" 是 根节点 ... XMLGET 检索根节点 中的元素。为了return根节点的内容,可以使用表达式:
src_xml:"$" AS clinical_study_contents
这是一个简单的测试工具来演示这一点,以及 XMLGET 的有效使用(提取 "id_info" 元素的内容):
WITH STG_XML AS (
SELECT PARSE_XML() AS src_xml
FROM VALUES
($$
<clinical_study>
<!-- This xml conforms to an XML Schema at:
https://clinicaltrials.gov/ct2/html/images/info/public.xsd -->
<required_header>
<download_date>ClinicalTrials.gov processed this data on September 13, 2019</download_date>
<link_text>Link to the current ClinicalTrials.gov record.</link_text>
<url>https://clinicaltrials.gov/show/NCT00010010</url>
</required_header>
<id_info>
<org_study_id>CDR0000068431</org_study_id>
<secondary_id>NYU-0004</secondary_id>
<secondary_id>P-UPJOHN-NYU-0004</secondary_id>
<secondary_id>NCI-G00-1906</secondary_id>
</id_info>
</clinical_study>
$$)
)
SELECT src_xml:"$" AS clinical_study_contents
,XMLGET(src_xml, 'id_info') as id_info_element
,*
FROM STG_XML
;
Here is the Good Blog :
https://community.snowflake.com/s/article/Querying-Nested-XML-in-Snowflake
Also , PFB way to query nested XML elements.
Sample XML :
<?xml version="1.0"?>
<comtec version="2008">
<customer_transport_order>
<id>2880ORO</id>
<order_number>99833104701</order_number>
<priority>0</priority>
<order_date>2019-03-22</order_date>
<order_kind>
<code>VMI</code>
<name>VMI</name>
</order_kind>
<operational>true</operational>
<order_status>
<code>cancel</code>
<name>cancel</name>
<status_kind>cancel</status_kind>
</order_status>
<contact>
<id>CEN143096</id>
<code>CEN127431</code>
<name>SOUTHERN UNITED ENTERPRISES</name>
</contact>
</customer_transport_order>
</comtec>
Sample Query:
select
XMLGET( cust.value, 'order_number' ):"$"::integer as cust_order,
XMLGET( cust.value, 'order_date' ):"$"::string as cust_date,
XMLGET( orderkind.value, 'code' ):"$"::string as order_kind,
XMLGET( contactval.value, 'id' ):"$"::string as contactval,
XMLGET( contactval.value, 'code' ):"$"::string as contactcode,
XMLGET( contactval.value, 'name' ):"$"::string as contactname
from
dept_emp_addr
, lateral FLATTEN(dept_emp_addr.xmldata:"$") cust
, lateral FLATTEN(cust.value:"$") orderkind
, lateral FLATTEN(cust.value:"$") contactval
where cust.value like '<customer_transport_order>%' AND orderkind.value like '<order_kind>%'
AND contactval.value like '<contact>%'
ORDER BY cust_order;
[1]: https://community.snowflake.com/s/article/Querying-Nested-XML-in-Snowflake
(代表 Snowflake 用户提交)
使用:
<clinical_study>
<!-- This xml conforms to an XML Schema at:
https://clinicaltrials.gov/ct2/html/images/info/public.xsd -->
<required_header>
<download_date>ClinicalTrials.gov processed this data on September 13, 2019</download_date>
<link_text>Link to the current ClinicalTrials.gov record.</link_text>
<url>https://clinicaltrials.gov/show/NCT00010010</url>
</required_header>
<id_info>
<org_study_id>CDR0000068431</org_study_id>
<secondary_id>NYU-0004</secondary_id>
<secondary_id>P-UPJOHN-NYU-0004</secondary_id>
<secondary_id>NCI-G00-1906</seco
我得到的是空值而不是根元素内容。我已经从 Snowflake 的文档中阅读了“How to Easily Load and Query XML Data with Snowflake Part 2”,并且正在使用:
SELECT XMLGET(src_xml, 'clinical_study'):"$",
*
FROM STG_XML
;
...但它给了我 NULL,因为我正在尝试使用上面的 SQL.
获取根元素的内容任何想法、建议、and/or 解决方法?
正如 Mike Walton 所说,XML 是不完整的(这会阻止其他人轻易地复制 OP 询问的 NULL)。如果我们关闭打开的 XML 元素,XMLGET 中 NULL 的问题是 "clinical_study" 是 根节点 ... XMLGET 检索根节点 中的元素。为了return根节点的内容,可以使用表达式:
src_xml:"$" AS clinical_study_contents
这是一个简单的测试工具来演示这一点,以及 XMLGET 的有效使用(提取 "id_info" 元素的内容):
WITH STG_XML AS (
SELECT PARSE_XML() AS src_xml
FROM VALUES
($$
<clinical_study>
<!-- This xml conforms to an XML Schema at:
https://clinicaltrials.gov/ct2/html/images/info/public.xsd -->
<required_header>
<download_date>ClinicalTrials.gov processed this data on September 13, 2019</download_date>
<link_text>Link to the current ClinicalTrials.gov record.</link_text>
<url>https://clinicaltrials.gov/show/NCT00010010</url>
</required_header>
<id_info>
<org_study_id>CDR0000068431</org_study_id>
<secondary_id>NYU-0004</secondary_id>
<secondary_id>P-UPJOHN-NYU-0004</secondary_id>
<secondary_id>NCI-G00-1906</secondary_id>
</id_info>
</clinical_study>
$$)
)
SELECT src_xml:"$" AS clinical_study_contents
,XMLGET(src_xml, 'id_info') as id_info_element
,*
FROM STG_XML
;
Here is the Good Blog :
https://community.snowflake.com/s/article/Querying-Nested-XML-in-Snowflake
Also , PFB way to query nested XML elements.
Sample XML :
<?xml version="1.0"?>
<comtec version="2008">
<customer_transport_order>
<id>2880ORO</id>
<order_number>99833104701</order_number>
<priority>0</priority>
<order_date>2019-03-22</order_date>
<order_kind>
<code>VMI</code>
<name>VMI</name>
</order_kind>
<operational>true</operational>
<order_status>
<code>cancel</code>
<name>cancel</name>
<status_kind>cancel</status_kind>
</order_status>
<contact>
<id>CEN143096</id>
<code>CEN127431</code>
<name>SOUTHERN UNITED ENTERPRISES</name>
</contact>
</customer_transport_order>
</comtec>
Sample Query:
select
XMLGET( cust.value, 'order_number' ):"$"::integer as cust_order,
XMLGET( cust.value, 'order_date' ):"$"::string as cust_date,
XMLGET( orderkind.value, 'code' ):"$"::string as order_kind,
XMLGET( contactval.value, 'id' ):"$"::string as contactval,
XMLGET( contactval.value, 'code' ):"$"::string as contactcode,
XMLGET( contactval.value, 'name' ):"$"::string as contactname
from
dept_emp_addr
, lateral FLATTEN(dept_emp_addr.xmldata:"$") cust
, lateral FLATTEN(cust.value:"$") orderkind
, lateral FLATTEN(cust.value:"$") contactval
where cust.value like '<customer_transport_order>%' AND orderkind.value like '<order_kind>%'
AND contactval.value like '<contact>%'
ORDER BY cust_order;
[1]: https://community.snowflake.com/s/article/Querying-Nested-XML-in-Snowflake