python lxml 获取节点名称
python lxml get the name of a node
这是我的 xml 文件:
<FuzzyComparison>
<Modules>
<Module>
<name>AutosoukModelMakeFuzzyComparisonModule</name>
<configurationLoader>DefaultLoader</configurationLoader>
<configurationFile>MakesModels.conf</configurationFile>
<settings></settings>
</Module>
<Module>
<name>DefaultFuzzyComparisonModule</name>
<configurationLoader>DefaultLoader</configurationLoader>
<configurationFile>Buildings.conf</configurationFile>
<settings>
<attribute>building</attribute>
</settings>
</Module>
</Modules>
</FuzzyComparison>
这是我一直试图解析的代码:
from lxml import etree
class AttributesXMLParser():
def __init__(self):
self.doc=etree.parse('Items.xml')
def getValueOfTag(self, tagName): #This function returns the value of a specific tag for exmaple, the tageName could be "FirstDate"
return self.doc.find(tagName).text
def loadFuzzySettings(self):
modulesDict = list()
modules = self.doc.findall('FuzzyComparison/Modules/Module')
for module in modules:
moduleDict = dict()
moduleName = module.find('name').text
moduleDict['name'] = moduleName
moduleConfigurationLoader = module.find('configurationLoader').text
moduleDict['configurationLoader'] = moduleConfigurationLoader
moduleConfigurationFile = module.find('configurationFile').text
moduleDict['moduleConfigurationFile'] = moduleConfigurationFile
settings = module.findall('settings')
settingsDict = dict()
for oneSetting in settings:
settingsDict[oneSetting] = oneSetting.text
moduleDict['settings'] = settingsDict
modulesDict.append(moduleDict)
return modulesDict
这是结果:
[{'moduleConfigurationFile': 'MakesModels.conf', 'configurationLoader': 'Default
Loader', 'name': 'AutosoukModelMakeFuzzyComparisonModule', 'settings': {<Element
settings at 0x25257c8>: None}}, {'moduleConfigurationFile': 'Buildings.conf', '
configurationLoader': 'DefaultLoader', 'name': 'DefaultFuzzyComparisonModule', '
settings': {<Element settings at 0x2525e48>: '\n\t\t\t\t'}}]
我的问题
我不知道如何获取 settings
节点的名称和值,因为如您所见,除了 settings
之外的一切都运行良好,我需要这样:
"attribute": building
但是我的代码给了我:
{<Element settings at 0x2525e48>: '\n\t\t\t\t'}}]
你能帮忙解决一下吗?
由于 findall()
returns 是一个列表,您希望遍历该列表元素的内容,而不是列表本身。您还想使用元素的标签作为键,而不是使用元素本身。
settingsDict = {}
for settingsNode in module.findall('settings'):
for setting in settingsNode:
settingsDict[setting.tag] = setting.text
或者,如果您只有一个 settings
标签,
settingsDict = {}
for setting in module.find('settings'):
settingsDict[setting.tag] = setting.text
可以简化为:
settingsDict = {setting.tag: setting.text
for setting in module.find('settings')}
这是我的 xml 文件:
<FuzzyComparison>
<Modules>
<Module>
<name>AutosoukModelMakeFuzzyComparisonModule</name>
<configurationLoader>DefaultLoader</configurationLoader>
<configurationFile>MakesModels.conf</configurationFile>
<settings></settings>
</Module>
<Module>
<name>DefaultFuzzyComparisonModule</name>
<configurationLoader>DefaultLoader</configurationLoader>
<configurationFile>Buildings.conf</configurationFile>
<settings>
<attribute>building</attribute>
</settings>
</Module>
</Modules>
</FuzzyComparison>
这是我一直试图解析的代码:
from lxml import etree
class AttributesXMLParser():
def __init__(self):
self.doc=etree.parse('Items.xml')
def getValueOfTag(self, tagName): #This function returns the value of a specific tag for exmaple, the tageName could be "FirstDate"
return self.doc.find(tagName).text
def loadFuzzySettings(self):
modulesDict = list()
modules = self.doc.findall('FuzzyComparison/Modules/Module')
for module in modules:
moduleDict = dict()
moduleName = module.find('name').text
moduleDict['name'] = moduleName
moduleConfigurationLoader = module.find('configurationLoader').text
moduleDict['configurationLoader'] = moduleConfigurationLoader
moduleConfigurationFile = module.find('configurationFile').text
moduleDict['moduleConfigurationFile'] = moduleConfigurationFile
settings = module.findall('settings')
settingsDict = dict()
for oneSetting in settings:
settingsDict[oneSetting] = oneSetting.text
moduleDict['settings'] = settingsDict
modulesDict.append(moduleDict)
return modulesDict
这是结果:
[{'moduleConfigurationFile': 'MakesModels.conf', 'configurationLoader': 'Default
Loader', 'name': 'AutosoukModelMakeFuzzyComparisonModule', 'settings': {<Element
settings at 0x25257c8>: None}}, {'moduleConfigurationFile': 'Buildings.conf', '
configurationLoader': 'DefaultLoader', 'name': 'DefaultFuzzyComparisonModule', '
settings': {<Element settings at 0x2525e48>: '\n\t\t\t\t'}}]
我的问题
我不知道如何获取 settings
节点的名称和值,因为如您所见,除了 settings
之外的一切都运行良好,我需要这样:
"attribute": building
但是我的代码给了我:
{<Element settings at 0x2525e48>: '\n\t\t\t\t'}}]
你能帮忙解决一下吗?
由于 findall()
returns 是一个列表,您希望遍历该列表元素的内容,而不是列表本身。您还想使用元素的标签作为键,而不是使用元素本身。
settingsDict = {}
for settingsNode in module.findall('settings'):
for setting in settingsNode:
settingsDict[setting.tag] = setting.text
或者,如果您只有一个 settings
标签,
settingsDict = {}
for setting in module.find('settings'):
settingsDict[setting.tag] = setting.text
可以简化为:
settingsDict = {setting.tag: setting.text
for setting in module.find('settings')}