Snowflake:从 xml 文件的内部标签获取价值

Snowflake: getting value from inner tags of xml file

我正在尝试将 xml 文件导入雪花数据库 table。我创建了 table 和 XML 文件格式。 XML 文件格式是使用以下代码创建的:

CREATE OR REPLACE FILE FORMAT LAND_XML.PUBLIC.XML_FILE_FORMAT 
TYPE = 'XML' 
COMPRESSION = 'AUTO' 
PRESERVE_SPACE = FALSE 
STRIP_OUTER_ELEMENT = TRUE 
DISABLE_SNOWFLAKE_DATA = FALSE 
DISABLE_AUTO_CONVERT = FALSE 
IGNORE_UTF8_ERRORS = FALSE; 

XML 文件如下所示:

<?xml version="1.0" encoding="utf-8" ?>
<NoticeOfChange version="1.0.0" application_version="v0.0.1-5780-g16dbd00e9"
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xsi:noNamespaceSchemaLocation="https://prod.notices.govt/notices/">
    <ProducedBy>
        <Name>Land Department</Name>
        <Contact>
            <Name>Technical Support</Name>
            <Phone>0800 xxx xxx</Phone>
            <Email>customersupport@land.govt</Email>
        </Contact>
    </ProducedBy>
    <Notices>
    <Notice>
            <NoticeId>577</NoticeId>
            <NoticeType>NoticeOfChange</NoticeType>
            <Description>Notification of change of ownership of rating unit</Description>
            <Statutory>Under Local Government (Rating) Act 2020</Statutory>
   </Notice>
   <Notice>
            <NoticeId>578</NoticeId>
            <NoticeType>NoticeOfChange</NoticeType>
            <Description>Notification of change of ownership of rating unit</Description>
            <Statutory>Under Local Government (Rating) Act 2020</Statutory>
   </Notice>
    <Notice>
            <NoticeId>579</NoticeId>
            <NoticeType>NoticeOfChange</NoticeType>
            <Description>Notification of change of ownership of rating unit</Description>
            <Statutory>Under Local Government (Rating) Act 2020</Statutory>
   </Notice>
   <Notice>
            <NoticeId>580</NoticeId>
            <NoticeType>NoticeOfChange</NoticeType>
            <Description>Notification of change of ownership of rating unit</Description>
            <Statutory>Under Local Government (Rating) Act 2020</Statutory>
   </Notice>
   </Notices>
</NoticeOfChange>

当我将 XML 文件导入雪花数据库 table 时,它只显示两行(而不是预期的 4 行通知)。输出如下图所示:

当前文件格式根据 <produced by><notices> 标记将 XML 文件分成两行。但是,我对 <produced by> 标签不感兴趣(想丢弃它)并且想将 <notices> 标签中的通知转换为 table 的单独行。基于有限的知识,我无法将文件格式修改为我想要的输出。任何帮助将不胜感激?

根对象是 NoticeOfChange 所以压扁它会给你那个对象,ProducedByNotices,正如你所注意到的,你只想要后者,所以压扁那个子对象.. 通过 xmlget(d.xml, 'Notices'):"$"

因此仅将此 CTE 用于数据..

WITH data_table AS (
    SELECT PARSE_XML('<?xml version="1.0" encoding="utf-8" ?>
<NoticeOfChange version="1.0.0" application_version="v0.0.1-5780-g16dbd00e9"
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xsi:noNamespaceSchemaLocation="https://prod.notices.govt/notices/">
    <ProducedBy>
        <Name>Land Department</Name>
        <Contact>
            <Name>Technical Support</Name>
            <Phone>0800 xxx xxx</Phone>
            <Email>customersupport@land.govt</Email>
        </Contact>
    </ProducedBy>
    <Notices>
    <Notice>
            <NoticeId>577</NoticeId>
            <NoticeType>NoticeOfChange</NoticeType>
            <Description>Notification of change of ownership of rating unit</Description>
            <Statutory>Under Local Government (Rating) Act 2020</Statutory>
   </Notice>
   <Notice>
            <NoticeId>578</NoticeId>
            <NoticeType>NoticeOfChange</NoticeType>
            <Description>Notification of change of ownership of rating unit</Description>
            <Statutory>Under Local Government (Rating) Act 2020</Statutory>
   </Notice>
    <Notice>
            <NoticeId>579</NoticeId>
            <NoticeType>NoticeOfChange</NoticeType>
            <Description>Notification of change of ownership of rating unit</Description>
            <Statutory>Under Local Government (Rating) Act 2020</Statutory>
   </Notice>
   <Notice>
            <NoticeId>580</NoticeId>
            <NoticeType>NoticeOfChange</NoticeType>
            <Description>Notification of change of ownership of rating unit</Description>
            <Statutory>Under Local Government (Rating) Act 2020</Statutory>
   </Notice>
   </Notices>
</NoticeOfChange>') as xml
)

通过以下方式访问通知:

SELECT 
    f.value as notice
FROM data_table AS d
    ,lateral flatten(input=>xmlget(d.xml, 'Notices'):"$")f;

给予:

NOTICE
<Notice> <NoticeId>577</NoticeId> <NoticeType>NoticeOfChange</NoticeType> <Description>Notification of change of ownership of rating unit</Description> <Statutory>Under Local Government (Rating) Act 2020</Statutory> </Notice>
<Notice> <NoticeId>578</NoticeId> <NoticeType>NoticeOfChange</NoticeType> <Description>Notification of change of ownership of rating unit</Description> <Statutory>Under Local Government (Rating) Act 2020</Statutory> </Notice>
<Notice> <NoticeId>579</NoticeId> <NoticeType>NoticeOfChange</NoticeType> <Description>Notification of change of ownership of rating unit</Description> <Statutory>Under Local Government (Rating) Act 2020</Statutory> </Notice>
<Notice> <NoticeId>580</NoticeId> <NoticeType>NoticeOfChange</NoticeType> <Description>Notification of change of ownership of rating unit</Description> <Statutory>Under Local Government (Rating) Act 2020</Statutory> </Notice>

此时可以根据需要存储或访问独立部件。