如何从 XML 中提取多个 grandchildren/children 其中一个 child 是特定值?

How to extract multiple grandchildren/children from XML where one child is a specific value?

我正在使用一个 XML 文件,该文件存储了我们创建的聊天机器人的所有“版本”。目前我们有 18 个版本,我只关心最新的一个。我试图找到一种方法来提取所有 botDialogGroup 元素及其关联的 label 元素,用于此“v18”。 'botDialogGroup' 和 'label' 之间存在 one-to-many 关系。

这是 XML 的片段,其中 botDialogGroup 称为“转移”,label 称为“带问题转移”。并不是说这只是一个版本的Bot,一共有18个。

Link 到示例 XML 文件。 https://pastebin.com/aaDfBPUm

还要注意,fullNamebotVersions 的 child。而 botDialogGrouplabel 对于 botVersions 的 grandchild,他们的 parent 是 botDialogs.

<Bot>
  <botVersions>
    <fullName>v18</fullName>
    <botDialogs>
        <botDialogGroup>Transfer</botDialogGroup>
        <botSteps>
            <botVariableOperation>
                <askCollectIfSet>false</askCollectIfSet>
                <botMessages>
                    <message>Would you like to chat with an agent?</message>
                </botMessages>
                <botQuickReplyOptions>
                    <literalValue>Yes</literalValue>
                </botQuickReplyOptions>
                <botQuickReplyOptions>
                    <literalValue>No</literalValue>
                </botQuickReplyOptions>
                <botVariableOperands>
                    <disableAutoFill>true</disableAutoFill>
                    <sourceName>YesOrNoChoices</sourceName>
                    <sourceType>MlSlotClass</sourceType>
                    <targetName>Transfer_To_Agent</targetName>
                    <targetType>ConversationVariable</targetType>
                </botVariableOperands>
                <optionalCollect>false</optionalCollect>
                <quickReplyType>Static</quickReplyType>
                <quickReplyWidgetType>Buttons</quickReplyWidgetType>
                <retryMessages>
                    <message>I&apos;m sorry, I didn&apos;t understand that. You have to select an option to proceed.</message>
                </retryMessages>
                <type>Collect</type>
            </botVariableOperation>
            <type>VariableOperation</type>
        </botSteps>
        <botSteps>
            <botStepConditions>
                <leftOperandName>Transfer_To_Agent</leftOperandName>
                <leftOperandType>ConversationVariable</leftOperandType>
                <operatorType>Equals</operatorType>
                <rightOperandValue>No</rightOperandValue>
            </botStepConditions>
            <botSteps>
                <botVariableOperation>
                    <botVariableOperands>
                        <targetName>Transfer_To_Agent</targetName>
                        <targetType>ConversationVariable</targetType>
                    </botVariableOperands>
                    <type>Unset</type>
                </botVariableOperation>
                <type>VariableOperation</type>
            </botSteps>
            <botSteps>
                <botNavigation>
                    <botNavigationLinks>
                        <targetBotDialog>Main_Menu</targetBotDialog>
                    </botNavigationLinks>
                    <type>Redirect</type>
                </botNavigation>
                <type>Navigation</type>
            </botSteps>
            <type>Group</type>
        </botSteps>
        <botSteps>
            <botStepConditions>
                <leftOperandName>Transfer_To_Agent</leftOperandName>
                <leftOperandType>ConversationVariable</leftOperandType>
                <operatorType>Equals</operatorType>
                <rightOperandValue>Yes</rightOperandValue>
            </botStepConditions>
            <botStepConditions>
                <leftOperandName>Online_Product</leftOperandName>
                <leftOperandType>ConversationVariable</leftOperandType>
                <operatorType>NotEquals</operatorType>
                <rightOperandValue>OTP</rightOperandValue>
            </botStepConditions>
            <botStepConditions>
                <leftOperandName>Online_Product</leftOperandName>
                <leftOperandType>ConversationVariable</leftOperandType>
                <operatorType>NotEquals</operatorType>
                <rightOperandValue>TCF</rightOperandValue>
            </botStepConditions>
            <botSteps>
                <botVariableOperation>
                    <botVariableOperands>
                        <targetName>Transfer_To_Agent</targetName>
                        <targetType>ConversationVariable</targetType>
                    </botVariableOperands>
                    <type>Unset</type>
                </botVariableOperation>
                <type>VariableOperation</type>
            </botSteps>
            <botSteps>
                <botNavigation>
                    <botNavigationLinks>
                        <targetBotDialog>Find_Business_Hours</targetBotDialog>
                    </botNavigationLinks>
                    <type>Call</type>
                </botNavigation>
                <type>Navigation</type>
            </botSteps>
            <type>Group</type>
        </botSteps>
        <botSteps>
            <botNavigation>
                <botNavigationLinks>
                    <targetBotDialog>Direct_Transfer</targetBotDialog>
                </botNavigationLinks>
                <type>Redirect</type>
            </botNavigation>
            <type>Navigation</type>
        </botSteps>
        <developerName>Transfer_To_Agent</developerName>
        <label>Transfer with a question</label>
        <mlIntent>Transfer_To_Agent</mlIntent>
        <mlIntentTrainingEnabled>true</mlIntentTrainingEnabled>
        <showInFooterMenu>false</showInFooterMenu>
    </botDialogs>
</botVersions>
</Bot>

当前脚本

我遇到的问题是它会在整个树中搜索所有 18 个版本的 botDialogGrouplabel 元素,因为我使用的是 findall()。而我只希望它搜索最近 fullNamebotVersions,在本例中是“v18”。

手动输入“v18”不是问题,因为我总是知道要查找的版本。而且它很有用,因为不同的机器人有不同的版本。

import xml.etree.ElementTree as ET
import pandas as pd

cols = ["BotVersion", "DialogGroup", "Dialog"]
rows = []

tree = ET.parse('ChattyBot.xml')
root = tree.getroot()

for fullName in root.findall(".//fullName[.='v18']"):
    for botDialogGroup in root.findall(".//botDialogGroup"):
        for label in root.findall(".//label"):
            print(fullName.text, botDialogGroup.text, label.text)
            rows.append({"BotVersion": fullName.text,
            "DialogGroup": botDialogGroup.text,
            "Dialog": label.text})

df = pd.DataFrame(rows, columns=cols)

df.to_csv("botcsvfile.csv")

使用 pandas.

将所需的最终结果保存到 csv 文件
BotVersion DialogGroup Dialog
v18 Transfer Transfer with a question

好的,此代码假设您的 XML 将采用 version, dialog1, dialog2, dialog3, version2, dialog1, dialog2, etc... 的模式,如果不是这种情况,请告诉我,我将重新评估代码。但基本上循环代码并创建对话框组太版本然后按版本号排序。之后展平以获得嵌套列表形式以创建 pandas 数据框。

import xml.etree.ElementTree as ET
import pandas as pd

cols = ["BotVersion", "DialogGroup", "Dialog"]
rows = []

tree = ET.parse('test.xml')
root = tree.getroot()


for fullName in root.findall(".//botVersions"):
    versions = list(fullName)

# creating the many to one relation between the versions and bot dialogs
grouping = []
relations = []
for i, tag in enumerate(versions):
    if i == 0:
        relations.append(tag)
    elif tag.tag == 'fullName':
        grouping.append(relations)
        relations = []
        relations.append(tag)
    else:
        relations.append(tag)
        # edge case for end of list)
        if i == len(versions) - 1:
            grouping.append(relations)

#sorting by the text of the fullName tag to be able to slice the end for latest version
grouping.sort(key=lambda x: x[0].text)
rows = grouping[-1]

#flatening the text into rows for the pandas dataframe
version_number = rows[0].text
pandas_row = [version_number]
pandas_rows = []
for r in rows[1:]:
    pandas_row = [version_number]
    for child in r.iter():
        if child.tag in ['botDialogGroup', 'label']:
            pandas_row.append(child.text)
    pandas_rows.append(pandas_row)

df = pd.DataFrame(pandas_rows, columns=cols)
print(df)
from lxml import etree

bots = """your xml above"""
cols = ["BotVersion", "DialogGroup", "Dialog"]
rows = []
ver = 'v18'

root = etree.XML(bots)

for entry in root.xpath(f"//botVersions[//fullName[.='{ver}']]"):
    rows.append([ver,entry.xpath('//botDialogGroup/text()')[0],entry.xpath('//label/text()')[0]])
df = pd.DataFrame(rows, columns=cols)
df

输出应该是您预期的 df。