如何使用 XQuery 将连续标签转换为嵌套标签或 table

How to use XQuery to transform consecutive tags into nested tags or table

我有一个 XML 文件,其中包含连续标签而不是嵌套标签,如下所示:

<title>
    <subtitle>
        <topic att="TopicTitle">Topic title 1</topic>
        <content att="TopicSubtitle">topic subtitle 1</content>
        <content att="Paragraph">paragraph text 1</content>
        <content att="Paragraph">paragraph text 2</content>
        <content att="TopicSubtitle">topic subtitle 2</content>
        <content att="Paragraph">paragraph text 1</content>
        <content att="Paragraph">paragraph text 2</content>

        <topic att="TopicTitle">Topic title 2</topic>
        <content att="TopicSubtitle">topic subtitle 1</content>
        <content att="Paragraph">paragraph text 1</content>
        <content att="Paragraph">paragraph text 2</content>
        <content att="TopicSubtitle">topic subtitle 2</content>
        <content att="Paragraph">paragraph text 1</content>
        <content att="Paragraph">paragraph text 2</content>
    </subtitle>
</title>

我在 BaseX 中使用 XQuery,我想将其转换为具有以下列的 table:

Title      Subtitle      TopicTitle      TopicSubtitle      Paragraph
Irrelevant Irrelevant    Topic title 1   Topic Subtitle 1   paragraph text 1
Irrelevant Irrelevant    Topic title 1   Topic Subtitle 1   paragraph text 2
Irrelevant Irrelevant    Topic title 1   Topic Subtitle 2   paragraph text 1
Irrelevant Irrelevant    Topic title 1   Topic Subtitle 2   paragraph text 2
Irrelevant Irrelevant    Topic title 2   Topic Subtitle 1   paragraph text 1
Irrelevant Irrelevant    Topic title 2   Topic Subtitle 1   paragraph text 2
Irrelevant Irrelevant    Topic title 2   Topic Subtitle 2   paragraph text 1
Irrelevant Irrelevant    Topic title 2   Topic Subtitle 2   paragraph text 2

我是 XQuery 和 XPath 的新手,但我已经了解如何在节点中导航的基础知识以及 select 我需要的节点。我还不知道如何处理要转换为嵌套 XML 或 table(CSV?)的连续数据。有人可以帮忙吗?

您可以使用 tumbling window (https://www.w3.org/TR/xquery-30/#id-windows) 将平面 XML 转换为嵌套,例如

for tumbling window $w in title/subtitle/*
    start $t when $t instance of element(topic)
return
    <topic
        title="{$t/@att}">
        {
            for tumbling window $content in tail($w)
                start $c when $c/@att = 'TopicSubtitle'
            return
                <subtopic
                    title="{$c/@att}">
                    {
                        tail($content) ! <para>{node()}</para>
                    }
                </subtopic>
        }
    </topic>

给予

<topic title="TopicTitle">
    <subtopic title="TopicSubtitle">
        <para>paragraph text 1</para>
        <para>paragraph text 2</para>
    </subtopic>
    <subtopic title="TopicSubtitle">
        <para>paragraph text 1</para>
        <para>paragraph text 2</para>
    </subtopic>
</topic><topic title="TopicTitle">
    <subtopic title="TopicSubtitle">
        <para>paragraph text 1</para>
        <para>paragraph text 2</para>
    </subtopic>
    <subtopic title="TopicSubtitle">
        <para>paragraph text 1</para>
        <para>paragraph text 2</para>
    </subtopic>
</topic>

基于此,我认为您可以使用

将整个数据转换为分号分隔的数据
string-join(
<title>
    <subtitle>
        {
            for tumbling window $w in title/subtitle/*
                start $t when $t instance of element(topic)
            return
                <topic
                    title="{$t/@att}"
                    value="{$t}">
                    {
                        for tumbling window $content in tail($w)
                            start $c when $c/@att = 'TopicSubtitle'
                        return
                            <subtopic
                                title="{$c/@att}"
                                value="{$c}">
                                {
                                    tail($content) ! <para>{node()}</para>
                                }
                            </subtopic>
                    }
                </topic>
        }
    </subtitle>
</title>//para ! string-join(ancestor-or-self::* ! (text(), @value, 'Irrelevant')[1], ';'), '&#10;')

尽管位置分组是解决此类问题的最通用方法(即,在 XQuery 3.0+ 中翻滚 windows,在 XSLT 2.0+ 中翻滚 for-each-group/@group-starting-with,如 Martin Honnen 所述)我不要认为这里是绝对必要的,因为您实际上并没有尝试使用数据中隐含的层次结构。

具体来说,您要将一个具有隐式层次结构的平面结构转换为另一个具有隐式层次结构的平面结构,您可以按照以下几行进行操作:

<table>{
    for $para in title/subtitle/content[@att='paragraph']
    return <row>
      <cell>irrelevant</cell>
      <cell>irrelevant</cell>
      <cell>{$para/preceding-sibling::topic[1]/string()}</cell>
      <cell>{$para/preceding-sibling::content[@att='TopicSubtitle'][1]/string()}</cell>
      <cell>{$para/string()}</cell>
    </row>
}</table>