处理 xml 中属性的空白值
Handling blank values of attributes in xml
我正在打印其中存在的 XML 标签和属性的值。如果任何属性或标签的值为空,那么我将尝试打印 None
。我可以为空标签执行此操作,但代码不会打印 None
如果有任何空白属性值。
XML (a.xml):
<?xml version="1.0"?>
<?xml-stylesheet href="catalog.xsl" type="text/xsl"?>
<!DOCTYPE catalog SYSTEM "catalog.dtd">
<catalog>
<product description="Cardigan Sweater" product_image="cardigan.jpg">
<catalog_item gender="Men's">
<item_number sep = "help" dep = "paraug" note = "zempu">QWZ5671</item_number>
<line cap = "delp" des = "" fote = "cat"></line>
<cool_number>QWZ5671</cool_number>
<price>39.5</price>
<price></price>
</catalog_item>
</product>
</catalog>
代码:
from lxml import etree
from collections import defaultdict
root_1 = etree.parse('a.xml').getroot()
d1= []
for node in root_1.findall('.//catalog_item'):
item = defaultdict(list)
for x in node.iter():
# iterate over the items
for k, v in x.attrib.items():
item[k].append(v)
if x.attrib is None:
item[x.attrib].append('None')
if x.text is None:
item[x.tag].append('None')
elif x.text.strip():
item[x.tag].append(x.text.strip())
d1.append(dict(item))
print(d1)
当前输出:des
的属性值在 XML 中是空白的,因此它在这里是空白的,但行标记是 None
[{'gender': ["Men's"], 'sep': ['help'], 'dep': ['paraug'], 'note': ['zempu'], 'item_number': ['QWZ5671'], 'cap': ['delp'], 'des': [''], 'fote': ['cat'], 'line': ['None'], 'cool_number': ['QWZ5671'], 'price': ['39.5', 'None']}]
预期输出:如果属性值为空,则 None
也应该如此处 des
所示
[{'gender': ["Men's"], 'sep': ['help'], 'dep': ['paraug'], 'note': ['zempu'], 'item_number': ['QWZ5671'], 'cap': ['delp'], 'des': ['None'], 'fote': ['cat'], 'line': ['None'], 'cool_number': ['QWZ5671'], 'price': ['39.5', 'None']}]```
问题在于您当前测试空属性的方式:
if x.attrib is None:
这会检查一个节点是否有任何属性(x.attrib 是包含所有节点属性的字典)。你可以通过替换这个来修复它
for k, v in x.attrib.items():
item[k].append(v)
if x.attrib is None:
item[x.attrib].append('None')
由此
for k, v in x.attrib.items():
item[k].append(v if v else None) # use str(None) if you really need a string
这将产生以下输出:
[{'note': ['zempu'], 'item_number': ['QWZ5671'], 'cool_number': ['QWZ5671'], 'cap': ['delp'], 'des': [None], 'sep': ['help'], 'fote': ['cat'], 'dep': ['paraug'], 'line': ['None'], 'price': ['39.5', 'None'], 'gender': ["Men's"]}]
我正在打印其中存在的 XML 标签和属性的值。如果任何属性或标签的值为空,那么我将尝试打印 None
。我可以为空标签执行此操作,但代码不会打印 None
如果有任何空白属性值。
XML (a.xml):
<?xml version="1.0"?>
<?xml-stylesheet href="catalog.xsl" type="text/xsl"?>
<!DOCTYPE catalog SYSTEM "catalog.dtd">
<catalog>
<product description="Cardigan Sweater" product_image="cardigan.jpg">
<catalog_item gender="Men's">
<item_number sep = "help" dep = "paraug" note = "zempu">QWZ5671</item_number>
<line cap = "delp" des = "" fote = "cat"></line>
<cool_number>QWZ5671</cool_number>
<price>39.5</price>
<price></price>
</catalog_item>
</product>
</catalog>
代码:
from lxml import etree
from collections import defaultdict
root_1 = etree.parse('a.xml').getroot()
d1= []
for node in root_1.findall('.//catalog_item'):
item = defaultdict(list)
for x in node.iter():
# iterate over the items
for k, v in x.attrib.items():
item[k].append(v)
if x.attrib is None:
item[x.attrib].append('None')
if x.text is None:
item[x.tag].append('None')
elif x.text.strip():
item[x.tag].append(x.text.strip())
d1.append(dict(item))
print(d1)
当前输出:des
的属性值在 XML 中是空白的,因此它在这里是空白的,但行标记是 None
[{'gender': ["Men's"], 'sep': ['help'], 'dep': ['paraug'], 'note': ['zempu'], 'item_number': ['QWZ5671'], 'cap': ['delp'], 'des': [''], 'fote': ['cat'], 'line': ['None'], 'cool_number': ['QWZ5671'], 'price': ['39.5', 'None']}]
预期输出:如果属性值为空,则 None
也应该如此处 des
所示
[{'gender': ["Men's"], 'sep': ['help'], 'dep': ['paraug'], 'note': ['zempu'], 'item_number': ['QWZ5671'], 'cap': ['delp'], 'des': ['None'], 'fote': ['cat'], 'line': ['None'], 'cool_number': ['QWZ5671'], 'price': ['39.5', 'None']}]```
问题在于您当前测试空属性的方式:
if x.attrib is None:
这会检查一个节点是否有任何属性(x.attrib 是包含所有节点属性的字典)。你可以通过替换这个来修复它
for k, v in x.attrib.items():
item[k].append(v)
if x.attrib is None:
item[x.attrib].append('None')
由此
for k, v in x.attrib.items():
item[k].append(v if v else None) # use str(None) if you really need a string
这将产生以下输出:
[{'note': ['zempu'], 'item_number': ['QWZ5671'], 'cool_number': ['QWZ5671'], 'cap': ['delp'], 'des': [None], 'sep': ['help'], 'fote': ['cat'], 'dep': ['paraug'], 'line': ['None'], 'price': ['39.5', 'None'], 'gender': ["Men's"]}]