如何从 XML 中提取多个 grandchildren/children 其中一个 child 是特定值?
How to extract multiple grandchildren/children from XML where one child is a specific value?
我正在使用一个 XML 文件,该文件存储了我们创建的聊天机器人的所有“版本”。目前我们有 18 个版本,我只关心最新的一个。我试图找到一种方法来提取所有 botDialogGroup
元素及其关联的 label
元素,用于此“v18”。 'botDialogGroup' 和 'label' 之间存在 one-to-many 关系。
这是 XML 的片段,其中 botDialogGroup
称为“转移”,label
称为“带问题转移”。并不是说这只是一个版本的Bot,一共有18个。
Link 到示例 XML 文件。 https://pastebin.com/aaDfBPUm
还要注意,fullName
是 botVersions
的 child。而 botDialogGroup
和 label
对于 botVersions
的 grandchild,他们的 parent 是 botDialogs
.
<Bot>
<botVersions>
<fullName>v18</fullName>
<botDialogs>
<botDialogGroup>Transfer</botDialogGroup>
<botSteps>
<botVariableOperation>
<askCollectIfSet>false</askCollectIfSet>
<botMessages>
<message>Would you like to chat with an agent?</message>
</botMessages>
<botQuickReplyOptions>
<literalValue>Yes</literalValue>
</botQuickReplyOptions>
<botQuickReplyOptions>
<literalValue>No</literalValue>
</botQuickReplyOptions>
<botVariableOperands>
<disableAutoFill>true</disableAutoFill>
<sourceName>YesOrNoChoices</sourceName>
<sourceType>MlSlotClass</sourceType>
<targetName>Transfer_To_Agent</targetName>
<targetType>ConversationVariable</targetType>
</botVariableOperands>
<optionalCollect>false</optionalCollect>
<quickReplyType>Static</quickReplyType>
<quickReplyWidgetType>Buttons</quickReplyWidgetType>
<retryMessages>
<message>I'm sorry, I didn't understand that. You have to select an option to proceed.</message>
</retryMessages>
<type>Collect</type>
</botVariableOperation>
<type>VariableOperation</type>
</botSteps>
<botSteps>
<botStepConditions>
<leftOperandName>Transfer_To_Agent</leftOperandName>
<leftOperandType>ConversationVariable</leftOperandType>
<operatorType>Equals</operatorType>
<rightOperandValue>No</rightOperandValue>
</botStepConditions>
<botSteps>
<botVariableOperation>
<botVariableOperands>
<targetName>Transfer_To_Agent</targetName>
<targetType>ConversationVariable</targetType>
</botVariableOperands>
<type>Unset</type>
</botVariableOperation>
<type>VariableOperation</type>
</botSteps>
<botSteps>
<botNavigation>
<botNavigationLinks>
<targetBotDialog>Main_Menu</targetBotDialog>
</botNavigationLinks>
<type>Redirect</type>
</botNavigation>
<type>Navigation</type>
</botSteps>
<type>Group</type>
</botSteps>
<botSteps>
<botStepConditions>
<leftOperandName>Transfer_To_Agent</leftOperandName>
<leftOperandType>ConversationVariable</leftOperandType>
<operatorType>Equals</operatorType>
<rightOperandValue>Yes</rightOperandValue>
</botStepConditions>
<botStepConditions>
<leftOperandName>Online_Product</leftOperandName>
<leftOperandType>ConversationVariable</leftOperandType>
<operatorType>NotEquals</operatorType>
<rightOperandValue>OTP</rightOperandValue>
</botStepConditions>
<botStepConditions>
<leftOperandName>Online_Product</leftOperandName>
<leftOperandType>ConversationVariable</leftOperandType>
<operatorType>NotEquals</operatorType>
<rightOperandValue>TCF</rightOperandValue>
</botStepConditions>
<botSteps>
<botVariableOperation>
<botVariableOperands>
<targetName>Transfer_To_Agent</targetName>
<targetType>ConversationVariable</targetType>
</botVariableOperands>
<type>Unset</type>
</botVariableOperation>
<type>VariableOperation</type>
</botSteps>
<botSteps>
<botNavigation>
<botNavigationLinks>
<targetBotDialog>Find_Business_Hours</targetBotDialog>
</botNavigationLinks>
<type>Call</type>
</botNavigation>
<type>Navigation</type>
</botSteps>
<type>Group</type>
</botSteps>
<botSteps>
<botNavigation>
<botNavigationLinks>
<targetBotDialog>Direct_Transfer</targetBotDialog>
</botNavigationLinks>
<type>Redirect</type>
</botNavigation>
<type>Navigation</type>
</botSteps>
<developerName>Transfer_To_Agent</developerName>
<label>Transfer with a question</label>
<mlIntent>Transfer_To_Agent</mlIntent>
<mlIntentTrainingEnabled>true</mlIntentTrainingEnabled>
<showInFooterMenu>false</showInFooterMenu>
</botDialogs>
</botVersions>
</Bot>
当前脚本
我遇到的问题是它会在整个树中搜索所有 18 个版本的 botDialogGroup
和 label
元素,因为我使用的是 findall()
。而我只希望它搜索最近 fullName
的 botVersions
,在本例中是“v18”。
手动输入“v18”不是问题,因为我总是知道要查找的版本。而且它很有用,因为不同的机器人有不同的版本。
import xml.etree.ElementTree as ET
import pandas as pd
cols = ["BotVersion", "DialogGroup", "Dialog"]
rows = []
tree = ET.parse('ChattyBot.xml')
root = tree.getroot()
for fullName in root.findall(".//fullName[.='v18']"):
for botDialogGroup in root.findall(".//botDialogGroup"):
for label in root.findall(".//label"):
print(fullName.text, botDialogGroup.text, label.text)
rows.append({"BotVersion": fullName.text,
"DialogGroup": botDialogGroup.text,
"Dialog": label.text})
df = pd.DataFrame(rows, columns=cols)
df.to_csv("botcsvfile.csv")
使用 pandas.
将所需的最终结果保存到 csv 文件
BotVersion
DialogGroup
Dialog
v18
Transfer
Transfer with a question
好的,此代码假设您的 XML 将采用 version, dialog1, dialog2, dialog3, version2, dialog1, dialog2, etc...
的模式,如果不是这种情况,请告诉我,我将重新评估代码。但基本上循环代码并创建对话框组太版本然后按版本号排序。之后展平以获得嵌套列表形式以创建 pandas 数据框。
import xml.etree.ElementTree as ET
import pandas as pd
cols = ["BotVersion", "DialogGroup", "Dialog"]
rows = []
tree = ET.parse('test.xml')
root = tree.getroot()
for fullName in root.findall(".//botVersions"):
versions = list(fullName)
# creating the many to one relation between the versions and bot dialogs
grouping = []
relations = []
for i, tag in enumerate(versions):
if i == 0:
relations.append(tag)
elif tag.tag == 'fullName':
grouping.append(relations)
relations = []
relations.append(tag)
else:
relations.append(tag)
# edge case for end of list)
if i == len(versions) - 1:
grouping.append(relations)
#sorting by the text of the fullName tag to be able to slice the end for latest version
grouping.sort(key=lambda x: x[0].text)
rows = grouping[-1]
#flatening the text into rows for the pandas dataframe
version_number = rows[0].text
pandas_row = [version_number]
pandas_rows = []
for r in rows[1:]:
pandas_row = [version_number]
for child in r.iter():
if child.tag in ['botDialogGroup', 'label']:
pandas_row.append(child.text)
pandas_rows.append(pandas_row)
df = pd.DataFrame(pandas_rows, columns=cols)
print(df)
from lxml import etree
bots = """your xml above"""
cols = ["BotVersion", "DialogGroup", "Dialog"]
rows = []
ver = 'v18'
root = etree.XML(bots)
for entry in root.xpath(f"//botVersions[//fullName[.='{ver}']]"):
rows.append([ver,entry.xpath('//botDialogGroup/text()')[0],entry.xpath('//label/text()')[0]])
df = pd.DataFrame(rows, columns=cols)
df
输出应该是您预期的 df。
我正在使用一个 XML 文件,该文件存储了我们创建的聊天机器人的所有“版本”。目前我们有 18 个版本,我只关心最新的一个。我试图找到一种方法来提取所有 botDialogGroup
元素及其关联的 label
元素,用于此“v18”。 'botDialogGroup' 和 'label' 之间存在 one-to-many 关系。
这是 XML 的片段,其中 botDialogGroup
称为“转移”,label
称为“带问题转移”。并不是说这只是一个版本的Bot,一共有18个。
Link 到示例 XML 文件。 https://pastebin.com/aaDfBPUm
还要注意,fullName
是 botVersions
的 child。而 botDialogGroup
和 label
对于 botVersions
的 grandchild,他们的 parent 是 botDialogs
.
<Bot>
<botVersions>
<fullName>v18</fullName>
<botDialogs>
<botDialogGroup>Transfer</botDialogGroup>
<botSteps>
<botVariableOperation>
<askCollectIfSet>false</askCollectIfSet>
<botMessages>
<message>Would you like to chat with an agent?</message>
</botMessages>
<botQuickReplyOptions>
<literalValue>Yes</literalValue>
</botQuickReplyOptions>
<botQuickReplyOptions>
<literalValue>No</literalValue>
</botQuickReplyOptions>
<botVariableOperands>
<disableAutoFill>true</disableAutoFill>
<sourceName>YesOrNoChoices</sourceName>
<sourceType>MlSlotClass</sourceType>
<targetName>Transfer_To_Agent</targetName>
<targetType>ConversationVariable</targetType>
</botVariableOperands>
<optionalCollect>false</optionalCollect>
<quickReplyType>Static</quickReplyType>
<quickReplyWidgetType>Buttons</quickReplyWidgetType>
<retryMessages>
<message>I'm sorry, I didn't understand that. You have to select an option to proceed.</message>
</retryMessages>
<type>Collect</type>
</botVariableOperation>
<type>VariableOperation</type>
</botSteps>
<botSteps>
<botStepConditions>
<leftOperandName>Transfer_To_Agent</leftOperandName>
<leftOperandType>ConversationVariable</leftOperandType>
<operatorType>Equals</operatorType>
<rightOperandValue>No</rightOperandValue>
</botStepConditions>
<botSteps>
<botVariableOperation>
<botVariableOperands>
<targetName>Transfer_To_Agent</targetName>
<targetType>ConversationVariable</targetType>
</botVariableOperands>
<type>Unset</type>
</botVariableOperation>
<type>VariableOperation</type>
</botSteps>
<botSteps>
<botNavigation>
<botNavigationLinks>
<targetBotDialog>Main_Menu</targetBotDialog>
</botNavigationLinks>
<type>Redirect</type>
</botNavigation>
<type>Navigation</type>
</botSteps>
<type>Group</type>
</botSteps>
<botSteps>
<botStepConditions>
<leftOperandName>Transfer_To_Agent</leftOperandName>
<leftOperandType>ConversationVariable</leftOperandType>
<operatorType>Equals</operatorType>
<rightOperandValue>Yes</rightOperandValue>
</botStepConditions>
<botStepConditions>
<leftOperandName>Online_Product</leftOperandName>
<leftOperandType>ConversationVariable</leftOperandType>
<operatorType>NotEquals</operatorType>
<rightOperandValue>OTP</rightOperandValue>
</botStepConditions>
<botStepConditions>
<leftOperandName>Online_Product</leftOperandName>
<leftOperandType>ConversationVariable</leftOperandType>
<operatorType>NotEquals</operatorType>
<rightOperandValue>TCF</rightOperandValue>
</botStepConditions>
<botSteps>
<botVariableOperation>
<botVariableOperands>
<targetName>Transfer_To_Agent</targetName>
<targetType>ConversationVariable</targetType>
</botVariableOperands>
<type>Unset</type>
</botVariableOperation>
<type>VariableOperation</type>
</botSteps>
<botSteps>
<botNavigation>
<botNavigationLinks>
<targetBotDialog>Find_Business_Hours</targetBotDialog>
</botNavigationLinks>
<type>Call</type>
</botNavigation>
<type>Navigation</type>
</botSteps>
<type>Group</type>
</botSteps>
<botSteps>
<botNavigation>
<botNavigationLinks>
<targetBotDialog>Direct_Transfer</targetBotDialog>
</botNavigationLinks>
<type>Redirect</type>
</botNavigation>
<type>Navigation</type>
</botSteps>
<developerName>Transfer_To_Agent</developerName>
<label>Transfer with a question</label>
<mlIntent>Transfer_To_Agent</mlIntent>
<mlIntentTrainingEnabled>true</mlIntentTrainingEnabled>
<showInFooterMenu>false</showInFooterMenu>
</botDialogs>
</botVersions>
</Bot>
当前脚本
我遇到的问题是它会在整个树中搜索所有 18 个版本的 botDialogGroup
和 label
元素,因为我使用的是 findall()
。而我只希望它搜索最近 fullName
的 botVersions
,在本例中是“v18”。
手动输入“v18”不是问题,因为我总是知道要查找的版本。而且它很有用,因为不同的机器人有不同的版本。
import xml.etree.ElementTree as ET
import pandas as pd
cols = ["BotVersion", "DialogGroup", "Dialog"]
rows = []
tree = ET.parse('ChattyBot.xml')
root = tree.getroot()
for fullName in root.findall(".//fullName[.='v18']"):
for botDialogGroup in root.findall(".//botDialogGroup"):
for label in root.findall(".//label"):
print(fullName.text, botDialogGroup.text, label.text)
rows.append({"BotVersion": fullName.text,
"DialogGroup": botDialogGroup.text,
"Dialog": label.text})
df = pd.DataFrame(rows, columns=cols)
df.to_csv("botcsvfile.csv")
使用 pandas.
将所需的最终结果保存到 csv 文件BotVersion | DialogGroup | Dialog |
---|---|---|
v18 | Transfer | Transfer with a question |
好的,此代码假设您的 XML 将采用 version, dialog1, dialog2, dialog3, version2, dialog1, dialog2, etc...
的模式,如果不是这种情况,请告诉我,我将重新评估代码。但基本上循环代码并创建对话框组太版本然后按版本号排序。之后展平以获得嵌套列表形式以创建 pandas 数据框。
import xml.etree.ElementTree as ET
import pandas as pd
cols = ["BotVersion", "DialogGroup", "Dialog"]
rows = []
tree = ET.parse('test.xml')
root = tree.getroot()
for fullName in root.findall(".//botVersions"):
versions = list(fullName)
# creating the many to one relation between the versions and bot dialogs
grouping = []
relations = []
for i, tag in enumerate(versions):
if i == 0:
relations.append(tag)
elif tag.tag == 'fullName':
grouping.append(relations)
relations = []
relations.append(tag)
else:
relations.append(tag)
# edge case for end of list)
if i == len(versions) - 1:
grouping.append(relations)
#sorting by the text of the fullName tag to be able to slice the end for latest version
grouping.sort(key=lambda x: x[0].text)
rows = grouping[-1]
#flatening the text into rows for the pandas dataframe
version_number = rows[0].text
pandas_row = [version_number]
pandas_rows = []
for r in rows[1:]:
pandas_row = [version_number]
for child in r.iter():
if child.tag in ['botDialogGroup', 'label']:
pandas_row.append(child.text)
pandas_rows.append(pandas_row)
df = pd.DataFrame(pandas_rows, columns=cols)
print(df)
from lxml import etree
bots = """your xml above"""
cols = ["BotVersion", "DialogGroup", "Dialog"]
rows = []
ver = 'v18'
root = etree.XML(bots)
for entry in root.xpath(f"//botVersions[//fullName[.='{ver}']]"):
rows.append([ver,entry.xpath('//botDialogGroup/text()')[0],entry.xpath('//label/text()')[0]])
df = pd.DataFrame(rows, columns=cols)
df
输出应该是您预期的 df。